Welcome to the Pancreatic Expression Database Version 3.0

Examples: ANXA1, hsa-miR-187, ENSG00000182718, ENSP00000387545, IGKC_HUMAN
Available format: HGNC, miRBase accession, Ensembl gene/transcript/protein, SwissProt accession

In our continuous effort to increase the functionality of PED, we have substantially changed and improved its -omics, selections, specimens and annotation data types. This latest version has a redefined database structure, an extensive controlled vocabulary and more detailed and richer information than the previous versions.

Major improvements in version 3.0 over version 2.0 are as follows:

  • New data from differential Methylomics and large-scale meta-analysis studies.
  • Extensive controlled vocabulary that records specific details of the samples used in each study
  • New structure for filters and attributes in the portal to accommodate elaborate yet precise selection criteria for queries.
    • Segregated representation of experimental specimens/samples into two groups, namely target and baseline/control.
    • Separation of experiment and validation platforms to avoid ambiguity.
  • New integrated graphical tools to allow users to query, overlay and visualise retrieved information using UCSC Genome Browser and CIRCOS viewer.
  • New quick search option enabling users to quickly extract summarized information about a gene/protein of interest using HGNC/Ensembl gene, miRNA accession or SwissProt/Ensembl protein identifiers.
  • Direct access to Reactome, COSMIC, Uniprot, PRIDE and InterPro biomart-compatible datasets to allow integrative queries with PED data.
  • Addition of 19 large-scale studies.
  • New option allowing researchers to upload their own datasets for inclusion in PED.

Data content

The database contains data of differential expression or expression measurements on 12641 distinct genes and 33092 distinct copy number alterations extracted from 78 published transcriptomics, proteomics, methylomics, miRNA, meta-analysis or genomics studies of various pancreatic normal, benign, precancerous and malignant tissues, body fluids, cell lines and xenograft models under different treatment conditions. This describes pancreatic related-regulation events in 7924 genes/proteins, 44358 transcripts and 307 miRNAs as well as methylation events in 2438 genes and 15051 transcripts. The copy number alteration section includes information on 32002 gains, 23957 losses, 1053 deletions, 4717 amplifications and 88 Loss of heterozygosity (LOH) events occurring in distinct genes and genomic areas.

Specimens

Our database comprises -omics data from a wide range of specimens derived from tissues, fine needle aspirates and body fluids of healthy people and patients with pancreatic malignant or benign diseases. These are stored alongside information on different treatments and profiling data from cell lines and mouse models.

  • Normal tissues: ductal, islet, acinar and stromal cells, normal duodenum and liver.
  • Disease tissues: pancreatic intraepithelial neoplasias (PanIN-1a, PanIN-1b, PanIN-2, PanIN-3), primary and metastatic pancreatic ductal adenocarcinoma (PDAC), pancreatic endocrine tumors (PET) (functioning and non-functioning), pancreatic acinar cell carcinoma (PACC), well-differentiated and poorly-differentiated endocrine carcinoma (WDEC & PDEC), metastatic endocrine carcinoma, intraductal papillary mucinous neoplasms (IPMN), mucinous cystic, mucinous cystic ovarian type stroma, ampullary carcinoma, pancreatic cancer liver metastasis, chronic pancreatitis and pancreatic pseudocyst.
  • Cell lines: Human pancreatic ductal epithelial (HPDE), BxPC-3, Colo357, Hs766T, MiaPaCa-2, Panc-1, PL3, PL4, PL8, CFPAC-1, HPAC, MPanc-96, SU86.86, SW1990, Suit0028, HPAF-II, AsPC-1, PaTu8988, Suit007, L3.6pl, CFPAC, IMIMPC2, PT45, SKPC1, PL45, PancTul, PaCa44, PaTu8988T, HuPT4, YPAC, Capan-2, PaTu8902, PaTu8988, A818, Capan-1, Nor-P1, Panc10.05, Panc2.03, Panc2.13, Panc3.27, Panc3.3, Panc5.04, Panc6.03 and Panc8.13.
  • Body fluids: pancreatic juice, plasma, saliva, urine and serum.
  • Mouse models: ectopic and orthotopic xenografts from patient tissues and cancer cell lines.
  • Treatments/Drugs: Hsp 90 Inhibitor (IPI-504), epidermal growth factor receptor (EGFR) inhibitors erlotinib and cetuximab, SMO-acting antagonist of the Hh pathway, Gemcitabine, Cisplatin, Methotrexate, 5-Fluorouracil (5-FU) and oncolytic adenoviruses.
Additional important information on the specimen (source, collection, preparation, gender, cellularity, cohort), metastasis site (for metastatic samples) and patient cohorts have been included as well.

Omics technologies

These samples have been profiled on a wide range of transcriptomics, proteomics, methylomics, miRNA or genomics platforms. Data from large-scale meta-analysis are presented as well. In order to avoid ambiguity, we have presented experiment platforms separately from validation platforms.

  • Transcriptomics: Different Affymetrix GeneChip Human Genome arrays (U95 (A,B,C,D,E), U95Av2, U133A, U133B, U133 Plus 2.0, HuGeneFL and 1.0 ST), Illumina Human-6 Expression Beadchip, Agilent Whole Human Genome 4x44K Microarray , Sanger human 10K cDNA arrays, Sanger custom 5K1 cDNA arrays, Clontech Atlas Human Cancer cDNA Expression Array, cDNA Array (Human Genome Centre Tokyo), Serial Analysis of Gene Expression (SAGE), qRT-PCR, cDNA Array United Gene technique Ltd, Human Genome Oligo-Set-Version 2.0 (Operon, Germany).
  • Proteomics: one-dimensional and two-dimensional Gel electrophoresis, Two-dimensional difference gel electrophoresis (2D-DIGE), Enzyme-linked immunosorbent assay (ELISA), Isotope-code affinity tag (ICAT), Immunohistochemistry, Immunocytochemistry,Isobaric tags for relative and absolute quantification (iTRAQ), Matrix Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF) Mass spectrometry, Liquid Chromatography Mass Spectrometry (LC-MS/MS), Invitrogen ProtoArray Human Protein Microarrays, PCR, qRT-PCR, Western Blot, Northern Blot and sirius red staining.
  • Methylomics: Methylation-specific PCR (MSP), Illumina Infinium 27k Human Methylation Beadchip, Agilent Human CpG Island ChIP-on-Chip Microarray 244K.
  • miRNA: Agilent Human miRNA array, Ambion mirVana miRNA Bioarray, Geniom Biochip miRNA, Exiquon miRCURY LNA microRNA Array v.11.0 -hsa, mmu & rno, Affymetrix GeneChip miRNA Array, Ohio State University Comprehensive Cancer Center miRNA-Array (OSU_CCC); TaqMan Low Density Arrays (TLDA) Human MicroRNA (Applied Biosytems), Northern blot, qRT-PCR.
  • Meta Analysis: Reanalysis of Proteomic data from Liquid chromatography-Mass spectrometry experimenets, Reanalysis of Transcriptomic data from Affymetrix GeneChip Human Genome U133 Plus 2.0 and HuGeneFL Array experiments, Reanalysis of Transcriptomic data from Sanger human 10K and 45K cDNA array.
  • Genomics: Molecular Cytogenetics (MCG) Cancer Array-800, Illumina Infinium II Whole-Genome Genotyping Assay, Affymetrix GeneChip¨ Human Mapping 100K SNP Set, Affymetrix GeneChip Human Mapping 50K SNP array, Affymetrix Genome-Wide Human SNP Array 6.0,

Annotations

All the studies were manually processed, checked for accuracy and consistency and loaded into our relational database alongside annotations from several public resources such as Reactome, Ensembl, GO ontologies, dbSNP, multi-species comparisons, UniProt and the Human protein atlas. We imported the available Ensembl human genome annotations (Ensembl release 63) for genes and proteins, SNP information, sequences, gene structure and multi-species data enabling the integration and annotation of heterogeneous pancreatic data. In order to avoid integration and annotations errors, we used the pre-established Ensembl annotations and microarray probe set mapping. Ensembl links to Human Protein Atlas, UniProt/Swiss-Prot, RefSeq and UniProt/TrEMBL databases are made on the basis of sequence similarity. All other subsequent links are inferred from these mappings. Ensembl also establishes mappings to microarray probe set identifiers by matching probe set sequences to Ensembl transcripts. We have also added the Reactome, UniProt, PRIDE, InterPro and COSMIC data to expand data mining capabilities.

Data access

  • Web interface: Access to the data will be provided through a customised version of MartView, a BioMart web-based query interface.
  • Web services: Access is available from the BioMart central server where it is exposed to third party software, such as: the Bioconductor package biomaRt allowing easy interrogation within the open source R statistical environment and integration into expression profiling experiments, Galaxy framework and Cytoscape software. Interoperability with ICGC and TCGA is also possible through this web services layer.
  • DAS: our database is a DAS server providing DAS annotations for the wider community so it can be used in other resources or browsers such as Ensembl GeneView using GeneDAS protocol.
  • Linkout: Our database is referenced as a Linkout resource providing a Linkout annotation available at NCBI EntrezGene.

Data query

The database can be interrogated using combined criteria from pancreatic (disease stages, regulation, differential expression, expression, platform technology, publication) and/or public data (pathways, antibodies, genomic region, gene-related accessions, ontology, expression patterns, multi-species comparisons, protein data, SNPs). Thus, our database enables connections between otherwise disparate data sources and allows relatively simple navigation between all data types and annotations. Users can select to display or download the results to a file as 'HTML', 'CSV' for comma-separated values, 'TSV' for tab-separated values, 'XLS' for Excel, 'ADF' for array description format. One can select a compressed file output and the query will run in the background to be downloaded later. One needs to provide an e-mail address to receive a URL in a notification e-mail that allows the query results to be downloaded.

Alternatively, users can quickly extract summarized information about their gene/protein of interest from the home page. Users can provide the HGNC/Ensembl gene id, miRNA accession or SwissProt/Ensembl protein id in a dedicated search box and the results will summarize PED records related to the queried gene. Each record includes important attributes regarding the study and experiment where the gene/transcript was found. The attributes list includes information on the -omics technology, exact study, experimental platform, target and baseline specimens/samples used, regulation status, corresponding fold-change and p-value as well as the validation platform(s).

Visualization

We have added new graphical features to the Biomart query interface to allow users to query, overlay and visualise retrieved results. A separate browser window will appear where users can view the differentially expressed genes and/or copy number altered regions in the UCSC Genome Browser under different tracks. Users can choose to change the chromosomal view by selecting a chromosome from the drop-down list provided. A simple color-coding scheme is used where up-regulated genes and copy-number gains/amplifications are presented in green, whereas down-regulated genes and copy-number losses/deletions are presented in red. Genes, for which regulation information are not available in PED, are presented in black.

Alternatively, users can select a whole genome view of the retreived results using CIRCOS viewer. The colour coding is similar to that used for UCSC browser visualization. To provide additional flexibility, users can click on a particular chromosome band in the circus image to be redirected to the UCSC Genome Browser for a detailed view of the region of interest.

Data upload

Researchers can now upload their own datasets of interest to be included in the PED. A very basic set of information is required to complete the process. The user will first provide information regarding the published article. One study corresponds to one published article. Next, the user will provide information regarding the individual experiment such as experiment title, platform technology, specimen details and a compiled result data file. The submission will not be accepted until the user uploads the result data file containing the expression/copy number data. Once submitted, the uploaded data will be checked by our team before being included in PED. It is imperative for the users to provide an email address so that we can contact them in case there are any issues/questions regarding the uploaded data.

Interoperability

We believe that interoperability is a key factor in the utility and productive use of any current and future cancer database. This is essential to ensure the sustainability of any cancer database and facilitate its integration with major international efforts in cancer research such as the International Cancer Genome Consortium (ICGC), supported by the Biomart technology platform and The Cancer Genome Atlas (TCGA), supported by the Cancer Biomedical Bioinformatics Grid (caBIGTM) technology platform. This also will allow the design and implementation of more sophisticated analysis portals. The cancer research community needs open source fully interoperable resources allowing information connectivity and data sharing. Only these types of resource can ensure that cancer data generated across different organisations are shared, thereby maximising the impact of cancer research. By using the same BioMart technology for its data management system, our platform is fully interoperable with the ICGC. Through its web service layer, it also is interoperable with The Cancer Genome Atlas (TCGA) through its data mining platform caBIGTM. Similarly, our bioinformatics platform is integrated with other complementary resources such as Ensembl, Reactome, UniProt, PRIDE, InterPro and COSMIC.

View PED 2.0Copyright © 2008 Barts Cancer Institute