Skip to main content

Identifying molecular subtypes related to clinicopathologic factors in pancreatic cancer



Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal tumors and usually presented with locally advanced and distant metastasis disease, which prevent curative resection or treatments. In this regard, we considered identifying molecular subtypes associated with clinicopathological factor as prognosis factors to stratify PDAC for appropriate treatment of patients.


In this study, we identified three molecular subtypes which were significant on survival time and metastasis. We also identified significant genes and enriched pathways represented for each molecular subtype. Considering R0 resection patients included in each subtype, metastasis and survival times are significantly associated with subtype 1 and subtype 2.


We observed three PDAC molecular subtypes and demonstrated that those subtypes were significantly related with metastasis and survival time. The study may have utility in stratifying patients for cancer treatment.


PDAC has high propensity for local invasion and early development of metastasis, resulting poor long-term survival [13]. Moreover, more than 80% of patients are diagnosed at advanced stages and their survival times are extremely shorter than those from other solid tumors [1]. According to recent reports [4, 5], PDAC is the 4th most common cancer accompanied by 4th highest mortality rate among gastrointestinal tract cancers in the U.S.A. [4, 5]. In contrast with outcome of treatments improving in other solid cancers, prognosis of pancreatic cancer still remains low and unchanged for the past 15 years. At present, overall median survival of PDAC patients is 13 months, and median survival after R0 resection is 23 months [3].

Conventionally known PDAC factors are difficult to suggest prognostic factors of pancreatic cancer, because those factors are in the depth of invasion, lymph node metastasis, and histologic differentiation. Presently, the prognosis and treatment plan for patients are determined according to these prognostic factors and American joint committee on cancer (AJCC) tumor staging [6]. However, patients with the same AJCC stage or other pathologic prognostic factors have various clinical courses and prognosis. In addition, their responses to chemotherapy vary widely; therefore, established treatment plan and prognosis prediction with molecular datasets should extend patients' survival time. In the same context, identification of molecular subtypes would contribute the comprehensive understanding of a genomic transition and cancer development [1]. Unlike other solid tumor studies, identifying the molecular subtypes of PDAC has been frustrating due to lack of tumor specimens for such studies [2]. We attempted to resolve this problem with surgically collected 106 samples from the Seoul National University Hospital. Identification of molecular subtypes provides stratification of patients by their cancer genome context.

Materials and methods


From 2009 to 2011, 106 patients underwent surgery for pancreatic ductal adenocarcinoma at Seoul National University Hospital approved by the Institutional Review Board. Clinicopathologic data were prospectively collected in electronic medical record form. The patients had a postoperative follow-up for at least 1 year. All of the patients had fresh frozen tissue and acceptable quality of DNA extracted from the tissue. After the operation, 5x5 mm sized tumor tissues were immediately collected from surgical specimens and stored in a -70°C liquid nitrogen tank until DNA extraction. Routinely processed 4-μm thick paraffin-embedded sections from the same lesion were stained with hematoxylin and eosin, then submitted for histologic examination. Concentration of the DNA was calculated with spectrophotometer, and the DNA purity and integrity were evaluated by optical density 260/280 ratio for quality control. We selected 96 samples, which passed quality control test.


Survival after resection is associated with many clinical factors such as stage, grade (cell differentiation), and metastasis [7, 8]. We extracted 20,219 unique genes out of 22,077 genes (from 33,297 of probe ID) for each sample. All microarray gene expression data sets were transformed to log2 scale. To identify PDAC molecular subtypes, we used consensus clustering methods [9] and non-negative matrix factorization methods (NMF) [10]. Here we mainly discussed NMF clustering algorithm; NMF method is using factorizing expression profiles based on positive matrices decomposition. Main concept of NMF is using the mRNA expression matrix A(n: genes by m: subjects) following:


where A is n by m matrix, the size of matrix W is n by k, and the size of H is k by m. k is the column length of W and same as the row length of H. The number of clusters is k as well. To converge to the cost function of NMF algorithm, divergence method is used

D ( A | | W H ) = i j ( A i j log A i j ( W H ) i j - A i j + ( W H ) i j )

The updating rules for matrix W and H are followed by,

W i a < - W i a u H a u A i u / ( W H ) i u v H a v
H a u < - H a u i W i a A i u / ( W H ) i u k W k a

The components of matrix W and H are called metagenes, which contain sample and gene information of A, since those components are related with all of the gene expression levels of samples. Moreover, components of H contain all gene expression information as well as samples' clustering patterns. More details are explained in [10]. To perform NMF algorithm, we first downloaded Multi experimental view (MeV) [11] from Then we used the following parameters and options for MeV's setting: divergence is used for cost function (eq 1) with update rules (eq 2, 3), exponential scale is used for adjusting given data, and maximum iteration is at 1000. Cophenetic correlation coefficient [12]c provides a scalar value measuring robustness across the consensus matrix, by using microarray expression levels for each cluster. The cophenetic correlation coefficients are obtained as 0, being poorly-clustered, to 1, being well-clustered, and calculated by following equation,

c = i < j ( Y i j - y ) ( Z i j - z ) i < j ( Y i j - y ) 2 i < j ( Z i j - z ) 2

Where Y ij = |Y i Y j | is a distance between two observations; i, j, and Z i j is the dendrogrammatic distance of subtype distance between model Z i and Z j . The highest value of the cophenetic correlation coefficient determines the optimal number of clusters.


Demographics and pathologic characteristics of the ninety six pancreatic cancer patients are summarized in Table 1. The mean age of resection for the patients was 65.2 years and the ratio of male to female was 1:1.04. The median for follow up after resection was 14.3 months, and fifty eight patients had recurrence at the end of their follow-ups. R0 resection rate of the patients was 79.2%. Most patients, 94.8%, were diagnosed at Stage II.

Table 1 Demographics and pathologic characteristics of the study subjects.

Identifying 3 molecular subtypes of PDAC

We performed NMF with cophenetic coefficients testing size of cluster from 2 to 4. The resulting cophenetic correlation coefficient for clusters 2, 3, and 4 were 0.896, 0.994, and 0.979, respectively. Figure 1 shows the results of NMF and cophenetic correlation coefficients. Since the maximum peak of the cophenetic correlation coefficients' plot determines the optimal number of subtypes, selecting 3 cluster provides the best separations compared to the rest. Therefore, we further analyzed these 3 groups. In the case of three subtypes, the number of samples for each cluster is 43, 45, and 8 with 27, 22, and 4 censored samples, successively.

Figure 1
figure 1

Plot of NMF performances and Cophenetic coefficients correlation. (a) k=2 (b) k=3 and (c) k=4 where k is number of clusters. (d) Illustration of Cophenetic coefficients for number of clusters.

Analysis and comparison between determined subtypes

We plotted the Kaplan-Meier survival curve using IBM SPSS statistic 20 in Figure 2. The median of overall survival was 23 months, while the median survival times are 37.6, 19.2, and 13.8 months for each subtypes 1, 2, and 3, respectively. The p-value from the log-rank test comparing subtypes 1 and 2 is 0.001, comparing subtypes 1 and subtype 3 is 0.008, while the p-value between subtypes 2 and 3 is 0.374. Consistently, longer surviving patients have much less metastasis disease according to Table 2. Although subtypes 2 and 3 are clearly separated in NMF, with cophenetic correlation coefficients 0.97, Kaplan-Meier curve is not statistically significant; this might be due to the small sample size of subtype 3. Comparison results of clinicopathologic characteristics according to 3 molecular subtypes are summarized in Table 2.

Figure 2
figure 2

Kaplan-Meier survival curve. Kaplan-Meier survival curve comparing survival of individuals with subtype 1 (blue), subtype 2 (green), and subtype 3 (orange) with 0.001 p-value by log-rank statistics test.

Table 2 Clinicopathologic characteristics according to 3 molecular subtypes.

The mean age of each cluster is 66.3 (± 8.1), 64.2 (± 10.3), and 64.8 (± 7.4) for subtype 1, subtype 2, and subtype 3, respectively. R0 resection rate was significantly higher in subtype 1 (p = 0.029) than in subtypes 2 and 3.

Tumor size tended to be larger (p = 0.073) and endovenous invasion rate lower (p = 0.070) in subtype 3 than in subtypes 1 and 2. Recurrence rate inclined to be lower in subtype 1 than in subtypes 2 and 3 (p = 0.091) and distant metastasis rate tended to be higher in subtype 2 than in subtypes 1 and 3 (p = 0.022).

Especially, when comparing only subtype 1 with subtype 2, the ratio of R0 was significantly higher in subtype 1 than in subtype 2 (90.7% vs. 68.9%, p = 0.016), and distant metastasis ratio to non-metastasis was significantly higher in subtype 2 than in subtype 1 (66.7% vs. 39.5%, p = 0.018). The prognosis of subtype 2 is significantly worse than subtype 1 (median 19.2 vs. 37.6 months, p = 0.001). However, average age, sex, and local invasion are not significant among subtypes with p >0.5. The result implies that these molecular subtypes are useful for poor-prognosis markers for cancer treatment, by triggering the target genes.

Analysis and comparison of subtype 1 and subtype 2 restricted to R0 resection patients

We also analyzed sub-clinicopathologic characters restricted to R0 resection between 39 subtype 1 patients and 31 subtype 2 patients. Metastasis is significant between subtype 1 (n = 15, 38.5%) and subtype 2 (n = 21, 67.7%) with p-value = 0.018. We plotted Kaplan-Meier curve with R0 resection survival time demonstrating that prognosis is significantly poor in subtype 2 (median 22.4 mo) compared to subtype 1 (median not reached) with p = 0.024 in Figure 3. Disease free survival was significantly lower in subtype 2 than that of subtype 1 (median 10.9 vs. 20.6 months, p = 0.010) in Figure 4.

Figure 3
figure 3

Kaplan-Meier survival curve for R0 resection with survival.

Figure 4
figure 4

Kaplan-Meier survival curve for R0 resection with disease free survival.

Identifying enriched pathway between subtype 1 and subtype 2

For functional assessment of our subtype identification, we performed gene set enrichment analysis [13] to get enriched pathway information for subtype 1 and subtype 2. Since sample size of subtype 3 is much smaller than that of the other two, we excluded subtype 3 in this analysis. In this step of the analysis, we used KEGG pathway with 200 individuals downloaded from the molecular signatures data base (MSigDB) [13] with 1,000 permutations. The results of top nine pathways ordered by their absolute normalized enriched score in each subtype, are shown in Table 3. FDR q-value of all enriched pathways is less than 0.25.

Table 3 Enriched pathways of subtype 1 and subtype 2.

The enriched pathways of subtype 2 were related with fatal disease pathways including pancreatic cancer, renal cell carcinoma, and chronic myeloid leukemia. On the other hand, the enriched pathways of subtype 1 were related to immune system, such as hematopoietic cell lineage, cytokine-cytokine receptor interaction, and calcium signaling pathway. The findings of enriched pathways in Table 3 are consistent with survival analysis in Figure 1.

Identifying significant biomarkers for each subtype

More importantly, we identified differentially expressed genes between subtypes using significant analysis of microarray (SAM) [14]. Significant genes specific to each group were chosen one versus the rest, which implies one group (subtype 1) is compared to the other two groups (subtypes 2 and 3). Boxplot with Kruskal-Wallis test supported our clustering in Figure 5. We selected top 20 genes with 0 q-value ordered in fold-change for up-regulated genes in case subtypes, in Table 4. 10 bold genes were also found by Collisson [2] as PDAC assigner genes among 62. Interestingly, the 9 highlighted genes in subtype 3, on Korean pancreatic subtypes, are found at exocrine-like subtype assigner genes in three identified subtypes in Figure 1 (a) of previous study [2]. However, the proportion of sample size of subtype 3 from total Korean PDAC patients are much smaller than that of Exocrine-like subtype from GSE15471 data sets nm2344-S2 in [2], which are 8% and 36%, respectively. This excessive difference, between the two datasets of equal subtypes, require an extended study in the future.

Figure 5
figure 5

Boxplots of Kruskal-Wallis test using overexpressed genes in each subtype. Boxplots of Kruskal-Wallis test for comparing 3 subtypes (a) using overexpressed genes of subtype 1, (b) using overexpressed genes of subtype 2, and (c) using overexpressed genes of subtype 3.

Table 4 Significant genes for each subtype, 10 bold genes were also identified by Collisson [2].

Validation of the results

For the validation study of our findings, we used an independent dataset GSE28735 [15] downloaded from Gene expression omnibus. GSE28735 consists of 45 PDAC samples. We extracted all biomarkers in Table 4 from each validation sample and implemented NMF from rank 2 to rank 4. The highest cophenetic coefficient is 0.975 when rank is 3 in Figure 6. The best result of implementing Kaplan-Meier survival analysis was to compare subtype 3 versus rest, which yielded p-value 0.198. The p-values are 0.384 and 0.522 for the tests comparing subtype 1 versus rest and subtype 2 versus rest, respectively. The significant biomarkers matched 9 out of 13 for subtype 2 and 13 out of 20 for subtype 3 in Table 4.

Figure 6
figure 6

Plot of NMF performance k = 3 (a), and cophenetic coefficients of validation data sets (b).


It is an important issue to identify molecular subtypes for stratifying PDAC patients depending on clinicopathologic factors and molecular gene expression. These identified molecular subtypes can be utilized for stratifying patients into their appropriate treatment groups. In this regard, we used total of 96 PDAC samples, and identified 3 molecular subtypes which were significantly related to clinicopathologic factors such as metastasis, tumor size, residual, and survival time. The results consistently demonstrate that poor prognosis is significantly related to metastasis. We also identified enriched pathways for poor-prognosis and good-prognosis related to fatal diseases and immune system, respectively, in Table 3. Moreover, we suggested gene markers represented for each subtype to use in PDAC stratification. We also considered the restricted to R0 resection samples in each subtype. Prognosis was significantly worse in subtype 2 than in subtype 1. Disease free survival rate was significantly lower in subtype 2 compared to subtype 1. In addition, 13 out of 22 over-expressed genes of subtype 3 in our findings are also found in exocrine-like subtypes in previous study [2], but further study is required on the radically short survival time for Korean specific biomarker for PDAC subtypes using larger sample size.

Nevertheless, our findings have some limitation for being applied to the patients directly. Even though we selected the significant gene sets using strong machine learning tools, and successfully clustered 3 classes in validation data sets, we still need a further investigation for validation of following up patients and/or using new data sets. At this moment, such high quality gene expression data sets including R0 resection, metastasis and survival time information are not available in public.



Pancreatic ductal adenocarcinoma


Non-negative matrix factorization methods


American joint committee on cancer


Multi experimental view


Significant analysis of microarray


The molecular signatures data base.


  1. Stathis A, Moore MJ: Advanced pancreatic carcinoma: current treatment and future challenges. Nat Rev Clin Oncol 2010,7(3):163–172. 10.1038/nrclinonc.2009.236

    Article  Google Scholar 

  2. Collisson EA, Sadanandam A, Olson P, Gibb WJ, Truitt M, Gu S, Cooc J, Weinkle J, Kim GE, Jakkula L, et al.: Subtypes of pancreatic ductal adenocarcinoma and their differing responses to therapy. Nat Med 2011,17(4):500–503. 10.1038/nm.2344

    Article  Google Scholar 

  3. Kang MJ, Jang JY, Lee SE, Lim CS, Lee KU, Kim SW: Comparison of the long-term outcomes of uncinate process cancer and non-uncinate process pancreas head cancer: poor prognosis accompanied by early locoregional recurrence. Langenbecks Arch Surg 2010,395(6):697–706. 10.1007/s00423-010-0593-6

    Article  Google Scholar 

  4. Ferlay J, Shin HR, Bray F, Forman D, Mathers C, Parkin DM: Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer 2010,127(12):2893–2917. 10.1002/ijc.25516

    Article  Google Scholar 

  5. Bradley CJ, Yabroff KR, Dahman B, Feuer EJ, Mariotto A, Brown ML: Productivity costs of cancer mortality in the United States: 2000–2020. J Natl Cancer Inst 2008,100(24):1763–1770. 10.1093/jnci/djn384

    Article  Google Scholar 

  6. Edge SB, Compton CC: The American Joint Committee on Cancer: the 7th edition of the AJCC cancer staging manual and the future of TNM. Ann Surg Oncol 2010,17(6):1471–1474. 10.1245/s10434-010-0985-4

    Article  Google Scholar 

  7. Brennan MF, Kattan MW, Klimstra D, Conlon K: Prognostic nomogram for patients undergoing resection for adenocarcinoma of the pancreas. Ann Surg 2004,240(2):293–298. 10.1097/01.sla.0000133125.85489.07

    Article  Google Scholar 

  8. Clark EJ, Taylor MA, Connor S, O'Neill R, Brennan MF, Garden OJ, Parks RW: Validation of a prognostic nomogram in patients undergoing resection for pancreatic ductal adenocarcinoma in a UK tertiary referral centre. HPB (Oxford) 2008,10(6):501–505. 10.1080/13651820802356606

    Article  Google Scholar 

  9. Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov JP: GenePattern 2.0. Nat Genet 2006,38(5):500–501. 10.1038/ng0506-500

    Article  Google Scholar 

  10. Lee DD, Seung HS: Learning the parts of objects by non-negative matrix factorization. Nature 1999,401(6755):788–791. 10.1038/44565

    Article  Google Scholar 

  11. Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, et al.: TM4: a free, open-source system for microarray data management and analysis. Biotechniques 2003,34(2):374–378.

    Google Scholar 

  12. Sokal RR: The comparion of dengrograms by objective methods. Taxon 1962,11(2):33–40. 10.2307/1217208

    Article  Google Scholar 

  13. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005,102(43):15545–15550. 10.1073/pnas.0506580102

    Article  Google Scholar 

  14. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. P Natl Acad Sci USA 2001,98(9):5116–5121. 10.1073/pnas.091062498

    Article  Google Scholar 

  15. Zhang G, Schetter A, He P, Funamizu N, Gaedcke J, Ghadimi BM, Ried T, Hassan R, Yfantis HG, Lee DH, et al.: DPEP1 inhibits tumor cell invasiveness, enhances chemosensitivity and predicts clinical outcome in pancreatic ductal adenocarcinoma. Plos One 2012,7(2):e31507. 10.1371/journal.pone.0031507

    Article  Google Scholar 

Download references


Publication of this article has been funded by Healthcare Group, Future Technology R&D Division, SK Telecom Co., by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2012R1A3A2026438, 2008-0062618), and by a grant from the National R&D Program for Cancer Control (No. 1120310).

This article has been published as part of BioMedical Engineering OnLine Volume 13 Supplement 2, 2014: Selected articles from the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2013): BioMedical Engineering OnLine. The full contents of the supplement are available online at

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Jin-Young Jang or Taesung Park.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SK led the analysis of the data and drafted the manuscript. MJ analyzed the data, collected the data and literature searches, JJ collected the data and conceptualized the project. SH designed and performed microarray experiment. TP & SB contributed to conceptualization of the initial project and TP drafted the manuscript. SL advised the analysis. All authors read and approved the final manuscript.

Shinuk Kim, Mee Joo Kang contributed equally to this work.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit

The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, S., Kang, M.J., Lee, S. et al. Identifying molecular subtypes related to clinicopathologic factors in pancreatic cancer. BioMed Eng OnLine 13 (Suppl 2), S5 (2014).

Download citation

  • Published:

  • DOI: