Classification of the urinary metabolome using machine learning and potential applications to diagnosing interstitial cystitis

Feng Tong; Muhammad Shahid; Peng Jin; Sungyong Jung; Won Hwa Kim; Jayoung Kim

doi:10.14440/bladder.2020.815

Home

HomeEditorial Office Submissions

Article

Article Types

Year

—

Volume

Issue

Pages

—

Submit to Bladder

Apply for special issue

Cite this article

Download

149

Citations

1053

Views

INTRODUCTION

Interstitial cystitis (IC), also known as painful bladder syndrome or bladder pain syndrome, is a chronic visceral pain syndrome of unknown etiology that presents itself as a constellation of symptoms, including bladder pain, urinary frequency, urgency, and small voided volumes, in the absence of other identifiable diseases [1-3]. Urine is in direct contact with the bladder epithelial cells that could be giving rise to IC; as a result, metabolites released from bladder cells may be enriched in urine [4].

The urinary metabolome was previously investigated by our group for potential IC diagnostic biomarkers [5-7]. We attempted to identify IC-associated metabolites from urine specimens obtained from IC patients and controls using nuclear magnetic resonance (NMR). Our findings provided preliminary evidence that metabolomics analysis of urine can potentially segregate IC patients from controls. We sought to capture the most differentially detected NMR peaks and discern if there was a significant difference in the peak distribution between IC and control specimens. Based on multivariate statistical analysis, principal component analysis (PCA) suggested that the urinary metabolome of IC patients and controls were clearly different; 140 NMR peaks were significantly altered in IC patients (FDR < 0.05) compared to controls [5].

Machine learning (ML), originally described as a program that learns to perform a task or make decisions based on data, is a valuable and increasingly necessary tool for modern healthcare [8]. However, this definition is broad and could cover nearly any form of data-driven needs. ML is not a magical approach that can turn data in immediate benefits, even though many news outlets imply that it can. Rather, it is natural extension to traditional statistical approaches. In our present study, we utilized ML and automated performance metrics to evaluate the clinical value of our 140 identified NMR peaks. We used ML algorithms examine the relationship between metabolic expression and disease. We applied logistic regression (LR) [9] and support vector machine (SVM) [10,11], which are traditionally known to work well even with small sample sizes, to our metabolomics signatures and used this data together with patient clinicopathological features to diagnose IC. We used our dataset of 59 cases to train, test, and validate the model. The results showed that our ML-based algorithms were able to successfully identify IC patients from healthy subjects.

This study aimed to address the question of, “Does utilizing metabolic data in ML play a role in diagnosing IC?”. ML is a form of artificial intelligence (AI) and learns from past data in order to predict the future. Our NMR-based ML algorithm was able to collectively distinguish the IC patient urinary profile from that of controls.

MATERIALS AND METHODS

Ethics statement

For this paper, we used the deposited dataset derived from the published data. This study used the publicly deposited data, which does not need IRB approval.

Dataset

There are 59 samples in total in the IC dataset. In order to acquire IC-associated metabolites, urine samples were collected from 43 IC patient group and 16 healthy control group. Each urine specimen was analyzed using NMR and biomarkers were identified with 140 NMR peaks. The 140 NMR peak feature was utilized to apply the dataset to ML algorithms for classification of IC patients in this paper [5].

Method

Due to limited sample size, we adopted two machine learning algorithms, i.e., support vector machine (SVM) [10,11] and LR [9], that are traditional but work well even with small number of samples. These are supervised learning algorithms, where each data sample is represented by a number of features and comes with a label that tells which group the sample belongs to.

When data is represented as scattered data points in a feature space that consists of two clusters representing individual groups, SVM finds a decision boundary (either linear or non-linear) that separates the different groups. Training an SVM optimizes the decision boundary to maximize the margin between the clusters, and it requires a kernel function train a kernel SVM that learns a non-linear decision boundary, i.e., a non-linear classifier [12]. The model contains a user parameter known as “slack variable” that controls the width of the margin.

LR is also a classifier that learns via a linear model. By feeding a set of training samples with a number of features, it learns specific weights associated with features. When a data sample is input into to a LR model, a classification is made by a linear combination between the weights and the data; together with a sigmoid function, the combined value is mapped to a probability between 0 and 1. The predicted label is assigned according to the probability, and by minimizing the classification error (usually formulated using cross-entropy) in the training dataset, the weights are learned. One can add additional regularization terms in the model, such as l₁ or l₂-norm of the weights, where l₁-norm controls the sparsity of the weights [13], which will select the most important features, while l₂-norm controls the smoothness of the weights to make the model more robust [13,14]. Both SVM and LR were implemented using the sklearn package in Python.

Training

Because the sample size was very small, the leave-one-out cross validation (CV) [15] method was utilized to make full use of the data set and to obtain unbiased result from the classifiers. With leave-one-out, we picked one sample as a testing set while using the rest of samples as a training set to train and test the model. The same process was iterated for every sample in the dataset. An illustration of the leave-one-out CV workflow is given in the Figure 1.

For SVM, we performed a set of experiments with a linear model, radial basis function (RBF) kernel, polynomial kernel with degree being 3, 5 and 7. The slack variable was set to 1 for all cases. For LR, we tried l₁ and l₂ penalties with different strengths; i.e., the inverse of regularization strength C was set to 1, 5, and 10.

Evaluation

After repeating training and testing the model 59 times with leave-one-out CV, each sample was assigned a predicted label. By comparing these 59 predicted labels with the true labels, we constructed a confusion matrix by counting numbers of true positive (TP), true negative (TN), false positive (FP) and false negative (FN). From these numbers, accuracy, precision and recall were calculated to evaluate the performances of the models. Receiver operating characteristic (ROC) curve and precision and recall (PR) curve were plotted, and their area under the curve (AUC) are reported in the result section. Especially when the distribution of labels in the dataset is skewed, the AUC of the PR curve is a suitable measure for evaluating to account for the imbalance.

RESULTS

Classification of IC samples with SVM

SVM was applied to the IC dataset with the leave-one-out CV scheme to classify IC samples from controls. The result varied depending on user parameters (i.e., kernel type and kernel parameters) as shown in Figure 2 and Table 1. Comparing the numbers, it was found that SVM with polynomial kernel resulted in the best performance when the degree of the polynomial kernel was 3 with 86.4% accuracy, 0.88 AUC of PR curve, and 0.85 AUC of ROC curve. Although the accuracy was the highest when the degree was 5, the AUCs of ROC and PR curves with degrees set to 3 was the highest. Moreover, the degree equal to 3 has less chance of overfitting than a degree of 5.

Figure 1. IC classification experimental scheme with leave-one-out cross validation.

Figure 2. Classification result evaluation curves using SVM. A. Precision-Recall curve. B. ROC curve. The values of AUC are calculated for each curve and larger values indicate better performance.

Here, the usage of linear kernel did not perform well. It may be because the data were not linearly separable or simply the sample size (N = 59) was too small compared to the dimension of the data (i.e., 140 features). Performance of RBF kernel was also poor; looking at the accuracy using RBF kernel with SVM shown in Table 1 (i.e., 72.9%), it was the same as the proportion of IC samples in the dataset (i.e., 43 IC subjects out of 59 subjects) and its recall was 1. This means that the classifier was simply predicting that all the samples belong to IC group and was not able to handle the class distribution imbalance problem.

Classification of IC samples with LR

In addition to SVM experiment, LR was used to classify IC samples and the results are shown in Figure 3 and Table 2 with different user parameter settings. LR with l₁-penalty yielded the best performance when its penalty parameter was set to 10 with 84.7% accuracy, 0.91 for AUC of PR curve and 0.86 for the AUC of ROC curve, which was slight better than the results from SVM. These numbers are the best among several trials because of its randomness with the initial weights being trained, and the results from other trials did not differ much from those reported in Figure 3 and Table 2.

Table 1. The comparison of results from SVM with different set of parameters.

Parameters	TP	TN	FP	FN	Accuracy	Precision	Recall	AUC of PR	AUC of ROC
Kernel = linear	36	9	7	7	0.763	0.837	0.837	0.82	0.76
Kernel = poly, degree = 3	39	11	5	4	0.847	0.886	0.907	0.88	0.85
Kernel = poly, degree = 5	39	12	4	4	0.864	0.907	0.907	0.88	0.84
Kernel = poly, degree = 7	39	11	5	4	0.847	0.886	0.907	0.87	0.83
Kernel = RBF	43	0	16	0	0.729	0.729	1.000	0.36	0.00

Table 2. The comparison of results from LR with different set of parameters.

LR	TP	TN	FP	FN	Accuracy	Precision	Recall	AUC of PR	AUC of ROC
Penalty = l₁, C = 1	39	9	7	4	0.814	0.848	0.907	0.82	0.75
Penalty = l₁, C = 5	39	10	6	4	0.831	0.867	0.907	0.88	0.84
Penalty = l₁, C = 10	38	12	4	5	0.847	0.905	0.884	0.91	0.86
Penalty = l₂, C = 5	38	7	9	5	0.763	0.809	0.884	0.82	0.75
Penalty = l₂, C = 10	38	7	9	5	0.763	0.809	0.884	0.82	0.75

It was observed that LR worked well despite being a linear model. Notice that the performance of linear SVM was poor in Table 1; this is because of the l₁-norm penalty applied to the trained parameter imposing sparsity and behaving as a natural feature selector. When we checked the trained weight of features, most of the weights converged to 0 (a very small number on average of absolute values across the leave-one-out process). When the penalty parameter was 10, the average weights of 133 features was less than or equal to 0.1. This means that we only need a few critical features to predict correct label. In our experiment, feature ID = 73, 4, 129, and 35 were the most dominant features with the highest weights regardless of the random initialization. In other words, they were the four most useful NMR features. We have performed further statistical group analysis on these four NMR peaks using two-sample t-test, which resulted in P-values of 0.003, 0.001, 0.057, and 0.036 respectively. It was interesting to see that there were many other NMR peaks with even lower P-values and the peak ID = 129 had a P-value greater than 0.05. While these statistical tests are performed independently, our classification results were derived by taking all the peaks at the same time for the analysis and it demonstrates that a linear combination of the features can be more powerful to distinguish IC from controls.

The l₂-norm constraint did not contribute much in these experiments. This is because the model can robustly operate even without the l₂-norm regularizer, which typically degrades performance of models in exchange for model robustness. Especially with the l₁-norm regularizer significantly lowering the dimension of the data (with 133 redundant features), the sample size (N = 59) was sufficient to make robust and correct predictions for IC samples.

DISCUSSION

It comes with no surprise that medicine is awash with claims that ML applications into big healthcare data will create extraordinary revolutions [8,16,17]. Recent examples have demonstrated how big data and ML can create algorithms that can perform on par with human physicians. AI is one ML approach without prerequisites. Various AI techniques already exist, and successful metabolomics analysis has been reported in previous studies [18-20]. Conventional statistical analysis and AI-based methods were used to assess the discrimination capability of quantified metabolites. A multiple logistic regression (MLR) model, alternative decision tree (ADTree), neurofuzzy modelling (NFM), artificial neural network (ANN), and SVM machine learning methods were used [21,22].

Modern advancements in computational and data science, with its most popular implementation in ML, has facilitated novel complex data-driven research approaches. Combined with biostatistics, ML aims at learning from data. It accomplishes this by optimizing the performance of algorithms with immediate previous knowledge. ML can be applied in either a supervised or unsupervised fashion. Supervised learning entails monitoring of the algorithm while it is being trained to learn a correct class assignment from a set of parameters, such as how to make a correct diagnosis from clinical and laboratory information [18].

Current biomarkers for IC diagnosis and prognosis are insufficiently robust for clinical practice using AI. Instead, we used AI to identify IC-related metabolites in an NMR metabolomics dataset from our previous study [5], which was able to collectively distinguish IC patient urinary profiles from that of healthy controls. The development of diagnostic tools using ML may be useful for more accurately identifying IC patients. AI has the potential to manage the imprecision and uncertainty that is common in clinical and biological data. AI or ML-based algorithms can take several different forms. The icons in the presented figures in this paper represent typical ML methods. These include multilayer neuronal networks, decision tree-based algorithms, SVM, and related algorithms that separate classes by placing hyperplanes between them, and prototype-based algorithms, such as k-nearest neighbors that compare feature vectors carried by a case with those carried by other cases and assign classes based on similarities. ML-based algorithms are not being actively applied to IC research. Such applications could lead to a better understanding and deeper knowledge of metabolomics data, which would then provide insights into biomarker discovery.

Figure 3. Classification result evaluation curves using LR. A. Precision-Recall curve. B. ROC curve. The values of AUC are calculated for each curve and larger values indicate better performance.

Although this is out of scope for this study, AI algorithms can be used to predict IC progression or therapeutic responses, too [23,24]. Patient clinicopathological features are commonly used to train AI algorithms to predict patient outcomes in other diseases, such as cancer [25-27]. For instance, Wong et al. developed a prostate cancer patient-specific ML algorithm based on clinicopathological data to predict early biochemical recurrence after prostatectomy [28]. The resulting 3 ML algorithms were trained using 338 patients and achieved an accuracy of 95%–98% and AUC of 0.9–0.94. When compared to traditional Cox regression analysis, the 3 ML algorithms had superior prediction performance. This study demonstrated how AI algorithms, trained with clinicopathological data, imaging radiomic features, and genomic profiling, outperformed the prediction accuracy of D’Amico risk stratification, single clinicopathological features, and multiple discriminant analysis, a type of conventional multivariate statistics [28]. There is also a role for AI in selecting effective drugs for cancer treatment [29]. Using an ML-based algorithm, Saeed et al. quantified the phenotypes of castration-resistant prostate cancer cells and tested their response to over 300 emerging and established clinical cancer drugs [30].

We are aware that one of the limitations of this study includes the novelty of using crowdsourcing in medical biomarker development. To our knowledge, there is no previous reference for comparison. Additionally, this study was limited to participants in South Korea and to a 1-time point collection. A major problem associated with medical datasets is a small sample size [5]. Given that sufficiently large datasets are important when creating classification schemes for disease modeling, a relatively larger dataset can result in reasonable validation due to sufficient partitioning of training and testing sets. On the contrary, a smaller training dataset can lead to misclassifications and may result in unstable or biased models. For our study, a major problem was the small sample size. However, the reason for this is that it takes an immense amount of time, effort, and cost to collect a larger amount of medical research data. Furthermore, medical research data is often inconsistent, incomplete, or noisy in nature; thereby, reducing sample sizes even more. Such small sample size for high-dimensional data often leads to “curse of dimensionality”, i.e., failing to properly estimate necessary parameters due to lack of samples, which we also faced with only 59 samples for 140 NMR features. For the SVM used in this study, when casting its objective function as a dual form using Lagrangian multiplier, the optimization problem seeks for a sparse solution that identifies a few “support vectors” and thus greatly reduces the dimension of problem. For the LR, we used two different regularizers on the parameters to estimate, i.e., l₁ and l₂-norms, to avoid curse of dimensionality and obtain feasible solutions. As demonstrated in the results, as l₁-norm constraint behaved as a data-driven feature selector reducing the dimension of the problem, the classifier avoided the curse of dimensionality. Although we were able to stay away from the curse of dimensionality in this study, poor analysis may lead to data overfitting and irreproducible results. ML-based algorithms may be manipulated by datasets containing dominant but irrelevant features when the sample number is limited. Also, AI cannot be used as an end-all solution to any question. There are instances where traditional statistics has outperformed AI or where additional AI does not improve results.

In summary, we have found that ML-based algorithms can be applied to developing diagnostic models for IC patients. In the current clinical setting, urologists are generally dependent on cystoscopy and questionnaire-based decisions to diagnose IC due to a lack of objective molecular biomarkers. The purpose of this study was to develop machine learning methods for diagnosing IC and assess their performance using metabolomics data. Considering how ML techniques for analyzing omics data can play a role in predicting the diagnosis and prognosis of diseases, future studies should integrate use of a larger multidimensional and heterogenous dataset, application of more accurate validation results, and use of different techniques for classifying and selecting features to pave a promising way toward clinical applications.

References

1. Hanno P, Keay S, Moldwin R, Van Ophoven A (2005) International Consultation on IC - Rome, September 2004/Forging an International Consensus: progress in painful bladder syndrome/interstitial cystitis. Report and abstracts. Int Urogynecol J Pelvic Floor Dysfunct 16 Suppl 1: S2-S34. [PubMed] [Google Scholar]
2. Nordling J, Anjum FH, Bade JJ, Bouchelouche K, Bouchelouche P, et al. (2004) Primary evaluation of patients suspected of having interstitial cystitis (IC). Eur Urol 45: 662-669.doi: https://doi.org/10.1016/j.eururo.2003.11.021. [View Article] [PubMed] [Google Scholar]
3. Hanno PM, Burks DA, Clemens JQ, Dmochowski RR, Erickson D, et al. (2011) AUA guideline for the diagnosis and treatment of interstitial cystitis/bladder pain syndrome. J Urol 185: 2162-2170.doi: https://doi.org/10.1016/j.juro.2011.03.064. [View Article] [PubMed] [Google Scholar]
4. Urinology Think Tank Writing Group (2018) Urine: Waste product or biologically active tissue. Neurourol Urodyn 37: 1162-1168.doi: https://doi.org/10.1002/nau.23414. [View Article] [PubMed] [Google Scholar]
5. Wen H, Lee T, You S, Park S, Song H, et al. (2014) Urinary metabolite profiling combined with computational analysis predicts interstitial cystitis-associated candidate biomarkers. J Proteome Res 14: 541-548.doi: https://doi.org/10.1021/pr5007729. [View Article] [PubMed] [Google Scholar]
6. Kind T, Cho E, Park TD, Deng N, Liu Z, et al. (2016) Interstitial cystitis-associated urinary metabolites identified by mass-spectrometry based metabolomics analysis. Sci Rep 6: 39227.doi: https://doi.org/10.1038/srep39227. [View Article] [PubMed] [Google Scholar]
7. Shahid M, Lee MY, Yeon A, Cho E, Sairam V, et al. (2018) Menthol, a unique urinary volatile compound, is associated with chronic inflammation in interstitial cystitis. Sci Rep 8: 10859.doi: https://doi.org/10.1038/s41598-018-29085-3. [View Article] [PubMed] [Google Scholar]
8. Cahan EM, Hernandez-Boussard T, Thadaney-Israni S, Rubin DL (2019) Putting the data before the algorithm in big data addressing personalized healthcare. NPJ Digit Med 2: 78.doi: https://doi.org/10.1038/s41746-019-0157-2. [View Article] [PubMed] [Google Scholar]
9. Tolles J, Meurer WJ (2016) Logistic regression: relating patient characteristics to outcomes. JAMA 316: 533-534. [PubMed] [Google Scholar]
10. Cortes C, Vapnik V (1995) Support-vector networks. Machine learning 20: 273-97.doi: https://doi.org/10.1007/BF00994018. [View Article][Google Scholar]
11. Platt JC (1998) [Internet]. Microsoft Research. Sequential minimal optimization: A fast algorithm for training support vector machines. Available from: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-98-14.pdf.
12. Lin Y, Lee Y, Wahba G (2002) Support vector machines for classification in nonstandard situations. Mach Learn 46: 191-202.[Google Scholar]
13. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc B 58: 267-88.doi: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x. [View Article][Google Scholar]
14. Ng AY (2004) Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on Machine learning. New York, NY: Association for Computing Machinery.doi: https://doi.org/10.1145/1015330.1015435. [View Article]
15. Vehtari A, Gelman A, Gabry J (2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing 27: 1413-1432.doi: https://doi.org/10.1007/s11222-016-9696-4. [View Article][Google Scholar]
16. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, et al. (2019) A guide to deep learning in healthcare. Nat Med 25: 24-29.doi: https://doi.org/10.1038/s41591-018-0316-z. [View Article] [PubMed] [Google Scholar]
17. Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2018) Deep learning for healthcare: review, opportunities and challenges. Brief Bioinform 19: 1236-1246.doi: https://doi.org/10.1093/bib/bbx044. [View Article] [PubMed] [Google Scholar]
18. Zampieri G, Vijayakumar S, Yaneske E, Angione C (2019) Machine and deep learning meet genome-scale metabolic modeling. PLoS Comput Biol 15:doi: https://doi.org/10.1371/journal.pcbi.1007084. [View Article] [PubMed] [Google Scholar]
19. Bordbar A, Monk JM, King ZA, Palsson BO (2014) Constraint-based models predict metabolic and associated cellular functions. Nat Rev Genet 15: 107-120.doi: https://doi.org/10.1038/nrg3643. [View Article] [PubMed] [Google Scholar]
20. Cuperlovic-Culf M (2018) Machine Learning Methods for Analysis of Metabolic Data and Metabolic Pathway Modeling. Metabolites 8:doi: https://doi.org/10.3390/metabo8010004. [View Article] [PubMed] [Google Scholar]
21. Angermueller C, Pärnamaa T, Parts L, Stegle O (2016) Deep learning for computational biology. Mol Syst Biol 12: 878.doi: https://doi.org/10.15252/msb.20156651. [View Article] [PubMed] [Google Scholar]
22. Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18: 851-869.doi: https://doi.org/10.1093/bib/bbw068. [View Article] [PubMed] [Google Scholar]
23. Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, et al. (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 18: 463-477.doi: https://doi.org/10.1038/s41573-019-0024-5. [View Article] [PubMed] [Google Scholar]
24. Jing Y, Bian Y, Hu Z, Wang L, Xie XS (2018) Deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data era. AAPS J 20: 58.doi: https://doi.org/10.1208/s12248-018-0210-0. [View Article] [PubMed] [Google Scholar]
25. Klauschen F, Muller KR, Binder A, Bockmayr M, Hagele M, et al. (2018) Scoring of tumor-infiltrating lymphocytes: From visual estimation to machine learning. Semin Cancer Biol 52: 151-157.doi: https://doi.org/10.1016/j.semcancer.2018.07.001. [View Article] [PubMed] [Google Scholar]
26. Baptista D, Ferreira PG, Rocha M (2020) Deep learning for drug response prediction in cancer. Brief Bioinform: pii: bbz171.doi: https://doi.org/10.1093/bib/bbz171. [View Article] [PubMed] [Google Scholar]
27. Tolios A, De Las Rivas J, Hovig E, Trouillas P, Scorilas A, et al. (2020) Computational approaches in cancer multidrug resistance research: Identification of potential biomarkers, drug targets and drug-target interactions. Drug Resistance Updates 48: 100662.doi: https://doi.org/10.1016/j.drup.2019.100662. [View Article] [PubMed] [Google Scholar]
28. Wong NC, Lam C, Patterson L, Shayegan B (2018) Use of machine learning to predict early biochemical recurrence after robot-assisted prostatectomy. BJU Int 123: 51-57.doi: https://doi.org/10.1111/bju.14477. [View Article] [PubMed] [Google Scholar]
29. Madhukar NS, Khade PK, Huang L, Gayvert K, Galletti G, et al. (2019) A Bayesian machine learning approach for drug target identification using diverse data types. Nat Commun 10: 5221.doi: https://doi.org/10.1038/s41467-019-12928-6. [View Article] [PubMed] [Google Scholar]
30. Saeed K, Rahkama V, Eldfors S, Bychkov D, Mpindi JP, et al. (2017) Comprehensive drug testing of patient-derived conditionally reprogrammed Cells from Castration-resistant prostate cancer. Eur Urol 71: 319-327.doi: https://doi.org/10.1016/j.eururo.2016.04.019. [View Article] [PubMed] [Google Scholar]

Previous article in this issue

Next article in this issue

Bladder, Electronic ISSN: 2327-2120 Print ISSN: TBA, Published by POL Scientific

Publisher's Core Philosophy

We are committed to support the scientific community by publishing impactful research and enhancing communication among scientists. At POL Scientific, we are continuously looking for ways to accelerate scientific progress and to strive for transparency and open communication, making knowledge freely accessible without barrier.

171 Skyview St. San Francisco, CA 94131 U.S.A.

info@polscientific.com