The Pancreatic Analytics Hub hosts 4 core data sources: The Cancer Genome Atlas (TCGA), The International Cancer Genome Consortium (ICGC), Genomics Evidence Neoplasia Information Exchange (GENIE) and the Cancer Cell Line Encyclopaedia (CCLE).
TCGA: The Cancer Genome Atlas is a consortium dedicated to the systematic study of alterations in a variety of human cancers. It has made mRNA expression, mutation and methylation data from analysed cohorts publicly available, alongside associated clinical data. Currently, mRNA expression and mutation data from sequenced patients with pancreatic adenocarcinoma are available for analysis through the Analytics Hub, alongside associated clinical data.
ICGC: The International Cancer Genome Consortium is focussed on the generation of comprehensive catalogues of genomic abnormalities (somatic mutations, expression of genes, epigenetic modifications) in tumours from 50 different cancer types. It has made mRNA expression, DNA copy number, mutation and methylation data from analysed cohorts publicly available, alongside associated clinical data. Currently, mRNA expression and mutation data from sequenced patients with pancreatic adenocarcinoma or pancreatic endocrine neoplasms are available through the Analytics Hub.
GENIE: Genomics Evidence Neoplasia Information Exchange is a pilot project that seeks to identify and validate genomic biomarkers relevant to cancer treatment by linking tumour genomic data from clinical sequencing efforts with longitudinal clinical outcomes. It has made mutation data publicly available, alongside associated clinical data for a range of cancer types/subtypes. Mutation data from individuals with pancreatic cancer are available for analysis from the Analytics Hub.
CCLE: Cancer Cell Line Encyclopaedia project is an effort to conduct a detailed genetic characterisation of a large panel of human cancer cell lines. mRNA expression and mutation data for pancreatic cancer cell lines are available from the Analytics Hub.
|Table 1. Features of the Analytics Hub for Publicly Available Data Sources|
|Results Tab||Analytical features||TCGA||ICGC||GENIE||CCLE|
|Results Tab||Analytical features||TCGA||ICGC||GENIE||CCLE|
|Transcriptomics||Principal Component Analysis||✓||✓||✓|
Genomics data from publicly available sequencing cohorts can be analysed using the integrated Bioconductor package MAFtools, which facilitates the analysis of somatic variants containing single-nucleotide variants (SNV) and small insertion/deletions (indels), based on variant characteristics, gene interactions and protein changes.1.1 Summary. From this tab, a MAFtools summary plot can be viewed for each cohort, displaying the range of variant classifications, variant types and base substitution profiles as bar plots and boxplots for each user-selected cohort.
1.2 Somatic Interactions. Mutually exclusive or co-occurring set of genes (top 25 mutated) can be analysed, using the pair-wise Fisher’s Exact test to detect significant pairs of genes and visualised as a correlation matrix.
1.3 OncoPlot. Users can view the top (10, 25, 50) mutated genes in their cohort.
1.4 Lolliplot. Users can also select to view amino acid changes within each of the top 50 mutated genes in each cohort as a lollipop plot. The plots display the observed mutation distribution and protein domains, which are labelled for each selected gene. A summary of the observed somatic mutation rate for each selected gene is also provided alongside each plot.
1.5 Cancer Genome Interpreter (CGI). The CGI is a third party tool which has been developed for the interpretation of newly sequenced cancer genomes, annotations of variants with the potential to act as tumour drivers and their effect on treatment response. The Analytics Hub presents predicted driver variants and alterations with therapeutic biomarker potential.
1.6 Protein-Protein and Drug-Target Interactions. Drug-target interaction networks provide a vital tool for the characterisation of clinically actionable alterations across patient subgroups. Network plots displaying protein-protein and protein-drug interactions are available through the Analytics Hub. Variants within individual genes of interest can be queried against the DrugBank database for downstream analysis of therapeutic candidates.
1.7 Reactome Pathways. Variants within each query set are first mapped to the corresponding genes, which are then linked to altered biological pathways from the Reactome database. The results are presented in both tabular format and as an interactive Voronoi digram, which is colour-coded according to the number of patients in each selected cohort for which a pathway is altered.
2.1 Principal Component Analysis. Principal component analysis (PCA) reduces the dimensionality of data while retaining most of the variation in the dataset, making it possible to visually assess similarities and differences between different samples and determine whether groupings can be identified between individual samples. This exploratory analysis facilitates identification of the key factors affecting the variability in the mRNA expression data.
For each dataset, scatterplots representing the first two and the first three principal components (PCs) of the data are presented. Each data point represents the orientation of a single sample in the transcriptomic space projected on the PCA, with different colours indicating the biological group to which each sample belongs. The percentage values in brackets on each axis indicate the amount of variance in the data explained by the corresponding PC.
The global variability of the data can also be assessed from the scree plot. Here, you can identify the fraction of total variance (y-axis) attributed to each PC (x-axis). The PCs are ordered by decreasing order of contribution to total variance.
2.2 Expression Profiles. The distribution of mRNA expression measurements can be visualised across all samples for a user-defined gene (from the top 50 aberrantly expressed genes).
2.3 Correlation. Pairwise comparisons of expression profiles can be performed between multiple user-defined genes in each selected dataset and Pearson's correlation coefficients and p-values calculated for each comparison.
For queried set of genes (minimum of 3 genes), the Analytics Hub computes Pearson's correlation coefficients and corresponding p-values for all pairwise combinations of genes and displays the correlation coefficients in a form of pairwise comparison heatmap. The colour of each cell indicates correlation coefficient between corresponding genes labelled on the x-axis and y-axis. The heatmap colour key is displayed on the right-side of the plot with red and blue indicating high and low correlation values, respectively.
2.4 Survival Analysis. From this tab the relationship between the expression of genes of interest and survival can be assessed. A univariate Cox proportional hazards (PH) regression is applied to the survival data and the samples are assigned to risk groups based on the median dichotomisation of mRNA expression intensities of the selected gene. Relationships are presented as Kaplan-Meier plots. The hazard ratio (HR) and 95% confidence intervals (CI) from the Cox PH model and associated log-rank p-value are presented in the top right corner of the figure.