- Open Access
Identifying molecular subtypes related to clinicopathologic factors in pancreatic cancer
BioMedical Engineering OnLinevolume 13, Article number: S5 (2014)
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal tumors and usually presented with locally advanced and distant metastasis disease, which prevent curative resection or treatments. In this regard, we considered identifying molecular subtypes associated with clinicopathological factor as prognosis factors to stratify PDAC for appropriate treatment of patients.
In this study, we identified three molecular subtypes which were significant on survival time and metastasis. We also identified significant genes and enriched pathways represented for each molecular subtype. Considering R0 resection patients included in each subtype, metastasis and survival times are significantly associated with subtype 1 and subtype 2.
We observed three PDAC molecular subtypes and demonstrated that those subtypes were significantly related with metastasis and survival time. The study may have utility in stratifying patients for cancer treatment.
PDAC has high propensity for local invasion and early development of metastasis, resulting poor long-term survival [1–3]. Moreover, more than 80% of patients are diagnosed at advanced stages and their survival times are extremely shorter than those from other solid tumors . According to recent reports [4, 5], PDAC is the 4th most common cancer accompanied by 4th highest mortality rate among gastrointestinal tract cancers in the U.S.A. [4, 5]. In contrast with outcome of treatments improving in other solid cancers, prognosis of pancreatic cancer still remains low and unchanged for the past 15 years. At present, overall median survival of PDAC patients is 13 months, and median survival after R0 resection is 23 months .
Conventionally known PDAC factors are difficult to suggest prognostic factors of pancreatic cancer, because those factors are in the depth of invasion, lymph node metastasis, and histologic differentiation. Presently, the prognosis and treatment plan for patients are determined according to these prognostic factors and American joint committee on cancer (AJCC) tumor staging . However, patients with the same AJCC stage or other pathologic prognostic factors have various clinical courses and prognosis. In addition, their responses to chemotherapy vary widely; therefore, established treatment plan and prognosis prediction with molecular datasets should extend patients' survival time. In the same context, identification of molecular subtypes would contribute the comprehensive understanding of a genomic transition and cancer development . Unlike other solid tumor studies, identifying the molecular subtypes of PDAC has been frustrating due to lack of tumor specimens for such studies . We attempted to resolve this problem with surgically collected 106 samples from the Seoul National University Hospital. Identification of molecular subtypes provides stratification of patients by their cancer genome context.
Materials and methods
From 2009 to 2011, 106 patients underwent surgery for pancreatic ductal adenocarcinoma at Seoul National University Hospital approved by the Institutional Review Board. Clinicopathologic data were prospectively collected in electronic medical record form. The patients had a postoperative follow-up for at least 1 year. All of the patients had fresh frozen tissue and acceptable quality of DNA extracted from the tissue. After the operation, 5x5 mm sized tumor tissues were immediately collected from surgical specimens and stored in a -70°C liquid nitrogen tank until DNA extraction. Routinely processed 4- thick paraffin-embedded sections from the same lesion were stained with hematoxylin and eosin, then submitted for histologic examination. Concentration of the DNA was calculated with spectrophotometer, and the DNA purity and integrity were evaluated by optical density 260/280 ratio for quality control. We selected 96 samples, which passed quality control test.
Survival after resection is associated with many clinical factors such as stage, grade (cell differentiation), and metastasis [7, 8]. We extracted 20,219 unique genes out of 22,077 genes (from 33,297 of probe ID) for each sample. All microarray gene expression data sets were transformed to log2 scale. To identify PDAC molecular subtypes, we used consensus clustering methods  and non-negative matrix factorization methods (NMF) . Here we mainly discussed NMF clustering algorithm; NMF method is using factorizing expression profiles based on positive matrices decomposition. Main concept of NMF is using the mRNA expression matrix (: genes by : subjects) following:
where is by matrix, the size of matrix is by , and the size of is by . is the column length of and same as the row length of . The number of clusters is as well. To converge to the cost function of NMF algorithm, divergence method is used
The updating rules for matrix and are followed by,
The components of matrix and are called metagenes, which contain sample and gene information of , since those components are related with all of the gene expression levels of samples. Moreover, components of contain all gene expression information as well as samples' clustering patterns. More details are explained in . To perform NMF algorithm, we first downloaded Multi experimental view (MeV)  from http://www.tm4.org/ Then we used the following parameters and options for MeV's setting: divergence is used for cost function (eq 1) with update rules (eq 2, 3), exponential scale is used for adjusting given data, and maximum iteration is at 1000. Cophenetic correlation coefficient  provides a scalar value measuring robustness across the consensus matrix, by using microarray expression levels for each cluster. The cophenetic correlation coefficients are obtained as 0, being poorly-clustered, to 1, being well-clustered, and calculated by following equation,
Where Y ij = |Y i − Y j | is a distance between two observations; , , and is the dendrogrammatic distance of subtype distance between model and . The highest value of the cophenetic correlation coefficient determines the optimal number of clusters.
Demographics and pathologic characteristics of the ninety six pancreatic cancer patients are summarized in Table 1. The mean age of resection for the patients was 65.2 years and the ratio of male to female was 1:1.04. The median for follow up after resection was 14.3 months, and fifty eight patients had recurrence at the end of their follow-ups. R0 resection rate of the patients was 79.2%. Most patients, 94.8%, were diagnosed at Stage II.
Identifying 3 molecular subtypes of PDAC
We performed NMF with cophenetic coefficients testing size of cluster from 2 to 4. The resulting cophenetic correlation coefficient for clusters 2, 3, and 4 were 0.896, 0.994, and 0.979, respectively. Figure 1 shows the results of NMF and cophenetic correlation coefficients. Since the maximum peak of the cophenetic correlation coefficients' plot determines the optimal number of subtypes, selecting 3 cluster provides the best separations compared to the rest. Therefore, we further analyzed these 3 groups. In the case of three subtypes, the number of samples for each cluster is 43, 45, and 8 with 27, 22, and 4 censored samples, successively.
Analysis and comparison between determined subtypes
We plotted the Kaplan-Meier survival curve using IBM SPSS statistic 20 in Figure 2. The median of overall survival was 23 months, while the median survival times are 37.6, 19.2, and 13.8 months for each subtypes 1, 2, and 3, respectively. The p-value from the log-rank test comparing subtypes 1 and 2 is 0.001, comparing subtypes 1 and subtype 3 is 0.008, while the p-value between subtypes 2 and 3 is 0.374. Consistently, longer surviving patients have much less metastasis disease according to Table 2. Although subtypes 2 and 3 are clearly separated in NMF, with cophenetic correlation coefficients 0.97, Kaplan-Meier curve is not statistically significant; this might be due to the small sample size of subtype 3. Comparison results of clinicopathologic characteristics according to 3 molecular subtypes are summarized in Table 2.
The mean age of each cluster is 66.3 (± 8.1), 64.2 (± 10.3), and 64.8 (± 7.4) for subtype 1, subtype 2, and subtype 3, respectively. R0 resection rate was significantly higher in subtype 1 (p = 0.029) than in subtypes 2 and 3.
Tumor size tended to be larger (p = 0.073) and endovenous invasion rate lower (p = 0.070) in subtype 3 than in subtypes 1 and 2. Recurrence rate inclined to be lower in subtype 1 than in subtypes 2 and 3 (p = 0.091) and distant metastasis rate tended to be higher in subtype 2 than in subtypes 1 and 3 (p = 0.022).
Especially, when comparing only subtype 1 with subtype 2, the ratio of R0 was significantly higher in subtype 1 than in subtype 2 (90.7% vs. 68.9%, p = 0.016), and distant metastasis ratio to non-metastasis was significantly higher in subtype 2 than in subtype 1 (66.7% vs. 39.5%, p = 0.018). The prognosis of subtype 2 is significantly worse than subtype 1 (median 19.2 vs. 37.6 months, p = 0.001). However, average age, sex, and local invasion are not significant among subtypes with p >0.5. The result implies that these molecular subtypes are useful for poor-prognosis markers for cancer treatment, by triggering the target genes.
Analysis and comparison of subtype 1 and subtype 2 restricted to R0 resection patients
We also analyzed sub-clinicopathologic characters restricted to R0 resection between 39 subtype 1 patients and 31 subtype 2 patients. Metastasis is significant between subtype 1 (n = 15, 38.5%) and subtype 2 (n = 21, 67.7%) with p-value = 0.018. We plotted Kaplan-Meier curve with R0 resection survival time demonstrating that prognosis is significantly poor in subtype 2 (median 22.4 mo) compared to subtype 1 (median not reached) with p = 0.024 in Figure 3. Disease free survival was significantly lower in subtype 2 than that of subtype 1 (median 10.9 vs. 20.6 months, p = 0.010) in Figure 4.
Identifying enriched pathway between subtype 1 and subtype 2
For functional assessment of our subtype identification, we performed gene set enrichment analysis  to get enriched pathway information for subtype 1 and subtype 2. Since sample size of subtype 3 is much smaller than that of the other two, we excluded subtype 3 in this analysis. In this step of the analysis, we used KEGG pathway with 200 individuals downloaded from the molecular signatures data base (MSigDB)  with 1,000 permutations. The results of top nine pathways ordered by their absolute normalized enriched score in each subtype, are shown in Table 3. FDR q-value of all enriched pathways is less than 0.25.
The enriched pathways of subtype 2 were related with fatal disease pathways including pancreatic cancer, renal cell carcinoma, and chronic myeloid leukemia. On the other hand, the enriched pathways of subtype 1 were related to immune system, such as hematopoietic cell lineage, cytokine-cytokine receptor interaction, and calcium signaling pathway. The findings of enriched pathways in Table 3 are consistent with survival analysis in Figure 1.
Identifying significant biomarkers for each subtype
More importantly, we identified differentially expressed genes between subtypes using significant analysis of microarray (SAM) . Significant genes specific to each group were chosen one versus the rest, which implies one group (subtype 1) is compared to the other two groups (subtypes 2 and 3). Boxplot with Kruskal-Wallis test supported our clustering in Figure 5. We selected top 20 genes with 0 q-value ordered in fold-change for up-regulated genes in case subtypes, in Table 4. 10 bold genes were also found by Collisson  as PDAC assigner genes among 62. Interestingly, the 9 highlighted genes in subtype 3, on Korean pancreatic subtypes, are found at exocrine-like subtype assigner genes in three identified subtypes in Figure 1 (a) of previous study . However, the proportion of sample size of subtype 3 from total Korean PDAC patients are much smaller than that of Exocrine-like subtype from GSE15471 data sets nm2344-S2 in , which are 8% and 36%, respectively. This excessive difference, between the two datasets of equal subtypes, require an extended study in the future.
Validation of the results
For the validation study of our findings, we used an independent dataset GSE28735  downloaded from Gene expression omnibus. GSE28735 consists of 45 PDAC samples. We extracted all biomarkers in Table 4 from each validation sample and implemented NMF from rank 2 to rank 4. The highest cophenetic coefficient is 0.975 when rank is 3 in Figure 6. The best result of implementing Kaplan-Meier survival analysis was to compare subtype 3 versus rest, which yielded p-value 0.198. The p-values are 0.384 and 0.522 for the tests comparing subtype 1 versus rest and subtype 2 versus rest, respectively. The significant biomarkers matched 9 out of 13 for subtype 2 and 13 out of 20 for subtype 3 in Table 4.
It is an important issue to identify molecular subtypes for stratifying PDAC patients depending on clinicopathologic factors and molecular gene expression. These identified molecular subtypes can be utilized for stratifying patients into their appropriate treatment groups. In this regard, we used total of 96 PDAC samples, and identified 3 molecular subtypes which were significantly related to clinicopathologic factors such as metastasis, tumor size, residual, and survival time. The results consistently demonstrate that poor prognosis is significantly related to metastasis. We also identified enriched pathways for poor-prognosis and good-prognosis related to fatal diseases and immune system, respectively, in Table 3. Moreover, we suggested gene markers represented for each subtype to use in PDAC stratification. We also considered the restricted to R0 resection samples in each subtype. Prognosis was significantly worse in subtype 2 than in subtype 1. Disease free survival rate was significantly lower in subtype 2 compared to subtype 1. In addition, 13 out of 22 over-expressed genes of subtype 3 in our findings are also found in exocrine-like subtypes in previous study , but further study is required on the radically short survival time for Korean specific biomarker for PDAC subtypes using larger sample size.
Nevertheless, our findings have some limitation for being applied to the patients directly. Even though we selected the significant gene sets using strong machine learning tools, and successfully clustered 3 classes in validation data sets, we still need a further investigation for validation of following up patients and/or using new data sets. At this moment, such high quality gene expression data sets including R0 resection, metastasis and survival time information are not available in public.
Pancreatic ductal adenocarcinoma
Non-negative matrix factorization methods
American joint committee on cancer
Multi experimental view
Significant analysis of microarray
The molecular signatures data base.
Stathis A, Moore MJ: Advanced pancreatic carcinoma: current treatment and future challenges. Nat Rev Clin Oncol 2010,7(3):163–172. 10.1038/nrclinonc.2009.236
Collisson EA, Sadanandam A, Olson P, Gibb WJ, Truitt M, Gu S, Cooc J, Weinkle J, Kim GE, Jakkula L, et al.: Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nat Med 2011,17(4):500–503. 10.1038/nm.2344
Kang MJ, Jang JY, Lee SE, Lim CS, Lee KU, Kim SW: Comparison of the long-term outcomes of uncinate process cancer and non-uncinate process pancreas head cancer: poor prognosis accompanied by early locoregional recurrence. Langenbecks Arch Surg 2010,395(6):697–706. 10.1007/s00423-010-0593-6
Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM: Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer 2010,127(12):2893–2917. 10.1002/ijc.25516
Bradley CJ, Yabroff KR, Dahman B, Feuer EJ, Mariotto A, Brown ML: Productivity costs of cancer mortality in the United States: 2000–2020. J Natl Cancer Inst 2008,100(24):1763–1770. 10.1093/jnci/djn384
Edge SB, Compton CC: The American Joint Committee on Cancer: the 7th edition of the AJCC cancer staging manual and the future of TNM. Ann Surg Oncol 2010,17(6):1471–1474. 10.1245/s10434-010-0985-4
Brennan MF, Kattan MW, Klimstra D, Conlon K: Prognostic nomogram for patients undergoing resection for adenocarcinoma of the pancreas. Ann Surg 2004,240(2):293–298. 10.1097/01.sla.0000133125.85489.07
Clark EJ, Taylor MA, Connor S, O'Neill R, Brennan MF, Garden OJ, Parks RW: Validation of a prognostic nomogram in patients undergoing resection for pancreatic ductal adenocarcinoma in a UK tertiary referral centre. HPB (Oxford) 2008,10(6):501–505. 10.1080/13651820802356606
Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP: GenePattern 2.0. Nat Genet 2006,38(5):500–501. 10.1038/ng0506-500
Lee DD, Seung HS: Learning the parts of objects by non-negative matrix factorization. Nature 1999,401(6755):788–791. 10.1038/44565
Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, et al.: TM4: a free, open-source system for microarray data management and analysis. Biotechniques 2003,34(2):374–378.
Sokal RR: The comparion of dengrograms by objective methods. Taxon 1962,11(2):33–40. 10.2307/1217208
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005,102(43):15545–15550. 10.1073/pnas.0506580102
Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. P Natl Acad Sci USA 2001,98(9):5116–5121. 10.1073/pnas.091062498
Zhang G, Schetter A, He P, Funamizu N, Gaedcke J, Ghadimi BM, Ried T, Hassan R, Yfantis HG, Lee DH, et al.: DPEP1 inhibits tumor cell invasiveness, enhances chemosensitivity and predicts clinical outcome in pancreatic ductal adenocarcinoma. Plos One 2012,7(2):e31507. 10.1371/journal.pone.0031507
Publication of this article has been funded by Healthcare Group, Future Technology R&D Division, SK Telecom Co., by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012R1A3A2026438, 2008-0062618), and by a grant from the National R&D Program for Cancer Control (No. 1120310).
This article has been published as part of BioMedical Engineering OnLine Volume 13 Supplement 2, 2014: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2013): BioMedical Engineering OnLine. The full contents of the supplement are available online at http://www.biomedical-engineering-online.com/supplements/13/S2.
The authors declare that they have no competing interests.
SK led the analysis of the data and drafted the manuscript. MJ analyzed the data, collected the data and literature searches, JJ collected the data and conceptualized the project. SH designed and performed microarray experiment. TP & SB contributed to conceptualization of the initial project and TP drafted the manuscript. SL advised the analysis. All authors read and approved the final manuscript.
Shinuk Kim, Mee Joo Kang contributed equally to this work.