Cardiovascular risk prediction - a systems medicine approach; ; et al E-print/Working paper (2023) Background Guidelines for the prevention of cardiovascular disease (CVD) have recommended the assessment of the total CVD risk by risk scores. Current risk algorithms are low in sensitivity and ... [more ▼] Background Guidelines for the prevention of cardiovascular disease (CVD) have recommended the assessment of the total CVD risk by risk scores. Current risk algorithms are low in sensitivity and specificity and they have not incorporated emerging risk markers for CVD. We suggest that CVD risk assessment can be still improved. We have developed a long-term risk prediction model of cardiovascular mortality in patients with stable coronary artery disease (CAD) based on newly available machine learning and on an extended dataset of new biomarkers.Methods 2953 participants of the Ludwigshafen Risk and Cardiovascular Health (LURIC) study were included. 184 laboratory and 21 demographic markers were ranked according to their contribution to risk of cardiovascular (CV) mortality using different data mining approaches. A self-learning bioinformatics workflow, including seven different machine learning algorithms, was developed for CV risk prediction. The study population was stratified into patients with and without significant CAD. Thereby, significant CAD was defined as a lumen narrowing of 50 or more in at least one of the coronary segments or a history of definite myocardial infarction. The machine learning models in both subpopulations were compared with established CV risk assessment tools.Results After a follow-up of 10 years, 603 (20.4%) patients died of cardiovascular causes. 95% patients without CAD deceased within ten years and 247 (13.2 %) patients with CAD within 5 years. Overall and in patients without CAD, NT-proBNP (N-terminal pro B-type natriuretic peptide), TnT (Troponin T), estimated cystatin c based GFR (glomerular filtration rate) and age were the highest ranked predictors, while in patients with CAD, NT-proBNP, GFR, CT-proAVP (C-terminal pro arginine vasopressin) and TNT were highest predictive. In the comparison with the FRS, PROCAM and ESC risk scores, the machine learning workflow produced more accurate and robust CV mortality prediction in patients without CAD. Equivalent CV risk prediction was obtained in the CAD subpopulation in comparison with the Marschner risk score. Overall, the existing algorithms in general tend to assign more patients into the medium risk groups, while the machine learning algorithms tend to have a clearer risk/no risk assignment. The framework is available upon request.Conclusion We have developed a fully automated and self-validating computational framework of machine learning techniques using an extensive database of clinical, routinely and non-routinely measured laboratory data. Our framework predicts long-term CV mortality at least as accurate as existing CVD risk scores. A combination of four highly ranked biomarkers and the random forest approach showed the best predictive results. Moreover, a dynamic computational model has several advantages over static CVD risk prediction tools: it is freeware, transparent, variable, transferable and expandable to any population, types of events and time frames. [less ▲] Detailed reference viewed: 165 (3 UL) Degree Adjusted Large-Scale Network Analysis Reveals Novel Putative Metabolic Disease Genes.Badkas, Apurva ; Nguyen, Thanh-Phuong ; et alin Biology (2021), 10(2), A large percentage of the global population is currently afflicted by metabolic diseases (MD), and the incidence is likely to double in the next decades. MD associated co-morbidities such as non-alcoholic ... [more ▼] A large percentage of the global population is currently afflicted by metabolic diseases (MD), and the incidence is likely to double in the next decades. MD associated co-morbidities such as non-alcoholic fatty liver disease (NAFLD) and cardiomyopathy contribute significantly to impaired health. MD are complex, polygenic, with many genes involved in its aetiology. A popular approach to investigate genetic contributions to disease aetiology is biological network analysis. However, data dependence introduces a bias (noise, false positives, over-publication) in the outcome. While several approaches have been proposed to overcome these biases, many of them have constraints, including data integration issues, dependence on arbitrary parameters, database dependent outcomes, and computational complexity. Network topology is also a critical factor affecting the outcomes. Here, we propose a simple, parameter-free method, that takes into account database dependence and network topology, to identify central genes in the MD network. Among them, we infer novel candidates that have not yet been annotated as MD genes and show their relevance by highlighting their differential expression in public datasets and carefully examining the literature. The method contributes to uncovering connections in the MD mechanisms and highlights several candidates for in-depth study of their contribution to MD and its co-morbidities. [less ▲] Detailed reference viewed: 289 (16 UL) An efficient machine learning-based approach for screening individuals at risk of hereditary haemochromatosis.Martins Conde, Patricia ; Sauter, Thomas ; Nguyen, Thanh-Phuong ![]() in Scientific reports (2020), 10(1), 20613 Hereditary haemochromatosis (HH) is an autosomal recessive disease, where HFE C282Y homozygosity accounts for 80-85% of clinical cases among the Caucasian population. HH is characterised by the ... [more ▼] Hereditary haemochromatosis (HH) is an autosomal recessive disease, where HFE C282Y homozygosity accounts for 80-85% of clinical cases among the Caucasian population. HH is characterised by the accumulation of iron, which, if untreated, can lead to the development of liver cirrhosis and liver cancer. Since iron overload is preventable and treatable if diagnosed early, high-risk individuals can be identified through effective screening employing artificial intelligence-based approaches. However, such tools expose novel challenges associated with the handling and integration of large heterogeneous datasets. We have developed an efficient computational model to screen individuals for HH using the family study data of the Hemochromatosis and Iron Overload Screening (HEIRS) cohort. This dataset, consisting of 254 cases and 701 controls, contains variables extracted from questionnaires and laboratory blood tests. The final model was trained on an extreme gradient boosting classifier using the most relevant risk factors: HFE C282Y homozygosity, age, mean corpuscular volume, iron level, serum ferritin level, transferrin saturation, and unsaturated iron-binding capacity. Hyperparameter optimisation was carried out with multiple runs, resulting in 0.94 ± 0.02 area under the receiving operating characteristic curve (AUCROC) for tenfold stratified cross-validation, demonstrating its outperformance when compared to the iron overload screening (IRON) tool. [less ▲] Detailed reference viewed: 198 (3 UL) Cross-disease analysis of Alzheimer’s disease and type-2 Diabetes highlights the role of autophagy in the pathophysiology of two highly comorbid diseases; Nguyen, Thanh-Phuong ; et alin Scientific Reports (2019), 9(1), 3965 Detailed reference viewed: 149 (1 UL) An Efficient Machine Learning Method to Solve Imbalanced Data in Metabolic Disease PredictionCecchini, Vania Filipa ; Nguyen, Thanh-Phuong ; Pfau, Thomas et alin Cecchini, Vania Filipa (Ed.) An Efficient Machine Learning Method to Solve Imbalanced Data in Metabolic Disease Prediction (2019) Detailed reference viewed: 224 (28 UL) A Fast Multiple Kernel Learning Framework with Dimensionality Reduction; Nguyen, Thanh-Phuong ; et alin Giang; Nguyen; Tran, Quoc Vinh (Eds.) et al Integrated Uncertainty in Knowledge Modelling and Decision Making (2018) Detailed reference viewed: 99 (1 UL) Stratifying cancer patients based on multiple kernel learning and dimensionality reduction; Nguyen, Thanh-Phuong ; in Giang, Thanh Trung; Nguyen, Thanh-Phuong; Tran (Eds.) 9th International Conference on Knowledge and Systems Engineering (KSE) (2017) Detailed reference viewed: 137 (0 UL) Computational Systems Biology: Inference and Modelling; ; et al Book published by Elsevier (2016) Computational Systems Biology: Inference and Modelling provides an introduction to, and overview of, network analysis inference approaches which form the backbone of the model of the complex behavior of ... [more ▼] Computational Systems Biology: Inference and Modelling provides an introduction to, and overview of, network analysis inference approaches which form the backbone of the model of the complex behavior of biological systems. This book addresses the challenge to integrate highly diverse quantitative approaches into a unified framework by highlighting the relationships existing among network analysis, inference, and modeling. The chapters are light in jargon and technical detail so as to make them accessible to the non-specialist reader. The book is addressed at the heterogeneous public of modelers, biologists, and computer scientists. [less ▲] Detailed reference viewed: 311 (21 UL) Systems biology integration of proteomic data in rodent models of depression reveals involvement of the immune response and glutamatergic signalling; Nguyen, Thanh-Phuong ; in Proteomics. Clinical Applications (2016) Purpose The pathophysiological basis of major depression is incompletely understood. Recently, numerous proteomic studies have been performed in rodent models of depression to investigate the molecular ... [more ▼] Purpose The pathophysiological basis of major depression is incompletely understood. Recently, numerous proteomic studies have been performed in rodent models of depression to investigate the molecular underpinnings of depressive-like behaviours with an unbiased approach. The objective of the study was to integrate the results of these proteomic studies in depression models to shed light on the most relevant molecular pathways involved in the disease. Experimental design Network analysis was performed integrating pre-existing proteomic data from rodent models of depression. The IntAct mouse and the HRPD were used as reference protein-protein interaction databases. The functionality analyses of the networks were then performed by testing over-represented GO biological process terms and pathways. Results Functional enrichment analyses of the networks revealed an association with molecular processes related to depression in humans, such as those involved in the immune response. Pathways impacted by clinically effective antidepressants were modulated, including glutamatergic signalling and neurotrophic responses. Moreover, dysregulations of proteins regulating energy metabolism and circadian rhythms were implicated. The comparison with protein pathways modulated in depressive patients revealed significant overlapping. Conclusions and clinical relevance This systems biology study supports the notion that animal models could contribute to the research into the biology and therapeutics of depression. [less ▲] Detailed reference viewed: 218 (3 UL) Diversity of key players in the microbial ecosystems of the human body.; ; et al in Scientific reports (2015), 5 Coexisting bacteria form various microbial communities in human body parts. In these ecosystems they interact in various ways and the properties of the interaction network can be related to the stability ... [more ▼] Coexisting bacteria form various microbial communities in human body parts. In these ecosystems they interact in various ways and the properties of the interaction network can be related to the stability and functional diversity of the local bacterial community. In this study, we analyze the interaction network among bacterial OTUs in 11 locations of the human body. These belong to two major groups. One is the digestive system and the other is the female genital tract. In each local ecosystem we determine the key species, both the ones being in key positions in the interaction network and the ones that dominate by frequency. Beyond identifying the key players and discussing their biological relevance, we also quantify and compare the properties of the 11 networks. The interaction networks of the female genital system and the digestive system show totally different architecture. Both the topological properties and the identity of the key groups differ. Key groups represent four phyla of prokaryotes. Some groups appear in key positions in several locations, while others are assigned only to a single body part. The key groups of the digestive and the genital tracts are totally different. [less ▲] Detailed reference viewed: 255 (13 UL) Cell type-selective disease-association of genes under high regulatory loadGalhardo, Mafalda Sofia ; ; Nguyen, Thanh-Phuong et alin Nucleic Acids Research (2015), 43(18), 8839-8855 We previously showed that disease-linked metabolic genes are often under combinatorial regulation. Using the genome-wide ChIP-Seq binding profiles for 93 transcription factors in nine different cell lines ... [more ▼] We previously showed that disease-linked metabolic genes are often under combinatorial regulation. Using the genome-wide ChIP-Seq binding profiles for 93 transcription factors in nine different cell lines, we show that genes under high regulatory load are significantly enriched for disease-association across cell types. We find that transcription factor load correlates with the enhancer load of the genes and thereby allows the identification of genes under high regulatory load by epigenomic mapping of active enhancers. Identification of the high enhancer load genes across 139 samples from 96 different cell and tissue types reveals a consistent enrichment for disease-associated genes in a cell type-selective manner. The underlying genes are not limited to super-enhancer genes and show several types of disease-association evidence beyond genetic variation (such as biomarkers). Interestingly, the high regulatory load genes are involved in more KEGG pathways than expected by chance, exhibit increased betweenness centrality in the interaction network of liver disease genes, and carry longer 3'UTRs with more microRNA (miRNA) binding sites than genes on average, suggesting a role as hubs integrating signals within regulatory networks. In summary, epigenetic mapping of active enhancers presents a promising and unbiased approach for identification of novel disease genes in a cell type-selective manner. [less ▲] Detailed reference viewed: 364 (41 UL) Novel drug target identification for the treatment of dementia using multi-relational association miningNguyen, Thanh-Phuong ; ; in Scientific Reports (2015), 5 Dementia is a neurodegenerative condition of the brain in which there is a progressive and permanent loss of cognitive and mental performance. Despite the fact that the number of people with dementia ... [more ▼] Dementia is a neurodegenerative condition of the brain in which there is a progressive and permanent loss of cognitive and mental performance. Despite the fact that the number of people with dementia worldwide is steadily increasing and regardless of the advances in the molecular characterization of the disease, current medical treatments for dementia are purely symptomatic and hardly effective. We present a novel multi-relational association mining method that integrates the huge amount of scientific data accumulated in recent years to predict potential novel targets for innovative therapeutic treatment of dementia. Owing to the ability of processing large volumes of heterogeneous data, our method achieves a high performance and predicts numerous drug targets including several serine threonine kinase and a G-protein coupled receptor. The predicted drug targets are mainly functionally related to metabolism, cell surface receptor signaling pathways, immune response, apoptosis, and long-term memory. Among the highly represented kinase family and among the G-protein coupled receptors, DLG4 (PSD-95), and the bradikynin receptor 2 are highlighted also for their proposed role in memory and cognition, as described in previous studies. These novel putative targets hold promises for the development of novel therapeutic approaches for the treatment of dementia. [less ▲] Detailed reference viewed: 279 (22 UL) A systems biology investigation of neurodegenerative dementia reveals a pivotal role of autophagy; Nguyen, Thanh-Phuong ![]() in BMC Systems Biology (2014), 8(1), 65 Detailed reference viewed: 180 (5 UL) Network analysis of neurodegenerative disease highlights a role of toll-like receptor signalingNguyen, Thanh-Phuong ; ; et alin BioMed Research International (2014), 2014 Despite significant advances in the study of the molecular mechanisms altered in the development and progression of neurodegenerative diseases (NDs), the etiology is still enigmatic and the distinctions ... [more ▼] Despite significant advances in the study of the molecular mechanisms altered in the development and progression of neurodegenerative diseases (NDs), the etiology is still enigmatic and the distinctions between diseases are not always entirely clear. We present an efficient computationalmethod based on protein-protein interaction network (PPI) tomodel the functional network of NDs. The aim of this work is fourfold: (i) reconstruction of a PPI network relating to the NDs, (ii) construction of an association network between diseases based on proximity in the disease PPI network, (iii) quantification of disease associations, and (iv) inference of potentialmolecularmechanisminvolved in the diseases.The functional links of diseases not only showed overlap with the traditional classification in clinical settings, but also offered new insight into connections between diseases with limited clinical overlap. To gain an expanded view of the molecular mechanisms involved in NDs, both direct and indirect connector proteins were investigated. The method uncovered molecular relationships that are in common apparently distinct diseases and provided important insight into the molecular networks implicated in disease pathogenesis. In particular, the current analysis highlighted the Toll-like receptor signaling pathway as a potential candidate pathway to be targeted by therapy in neurodegeneration. [less ▲] Detailed reference viewed: 279 (10 UL)![]() Inference of Autism-Related Genes by Integrating Protein-Protein Interactions and miRNA-Target Interactions; Nguyen, Thanh-Phuong ; et alin Knowledge and Systems Engineering (2014) Detailed reference viewed: 332 (2 UL) The central role of AMP-kinase and energy homeostasis impairment in Alzheimer’s disease: a multifactor network analysis; ; Nguyen, Thanh-Phuong et alin PLoS ONE (2013) Detailed reference viewed: 167 (4 UL)![]() Detecting disease genes based on semi-supervised learning and protein--protein interaction networksNguyen, Thanh-Phuong ; in Artificial Intelligence in Medicine (2012), 54(1), 63--71 Objective Predicting or prioritizing the human genes that cause disease, or “disease genes”, is one of the emerging tasks in biomedicine informatics. Research on network-based approach to this problem is ... [more ▼] Objective Predicting or prioritizing the human genes that cause disease, or “disease genes”, is one of the emerging tasks in biomedicine informatics. Research on network-based approach to this problem is carried out upon the key assumption of “the network-neighbour of a disease gene is likely to cause the same or a similar disease”, and mostly employs data regarding well-known disease genes, using supervised learning methods. This work aims to find an effective method to exploit the disease gene neighbourhood and the integration of several useful omics data sources, which potentially enhance disease gene predictions. Methods We have presented a novel method to effectively predict disease genes by exploiting, in the semi-supervised learning (SSL) scheme, data regarding both disease genes and disease gene neighbours via protein–protein interaction network. Multiple proteomic and genomic data were integrated from six biological databases, including Universal Protein Resource, Interologous Interaction Database, Reactome, Gene Ontology, Pfam, and InterDom, and a gene expression dataset. Results By employing a 10 times stratified 10-fold cross validation, the SSL method performs better than the k-nearest neighbour method and the support vector machines method in terms of sensitivity of 85%, specificity of 79%, precision of 81%, accuracy of 82%, and a balanced F-function of 83%. The other comparative experimental evaluations demonstrate advantages of the proposed method given a small amount of labeled data with accuracy of 78%. We have applied the proposed method to detect 572 putative disease genes, which are biologically validated by some indirect ways. Conclusion Semi-supervised learning improved ability to study disease genes, especially a specific disease when the known disease genes (as labeled data) are very often limited. In addition to the computational improvement, the analysis of predicted disease proteins indicates that the findings are beneficial in deciphering the pathogenic mechanisms. [less ▲] Detailed reference viewed: 199 (3 UL)![]() Studying protein--protein interaction networks: a systems view on diseases; Nguyen, Thanh-Phuong ; in Briefings in Functional Genomics (2012), 11(6), 497--504 Detailed reference viewed: 158 (7 UL)![]() Mining multiple biological data for reconstructing signal transduction networksNguyen, Thanh-Phuong ; in Data Mining: Foundations and Intelligent Paradigms (2012) Detailed reference viewed: 160 (1 UL) Inferring pleiotropy by network analysis: linked diseases in the human PPI networkNguyen, Thanh-Phuong ; ; in BMC Systems Biology (2011), 5(1), 179 Background: Earlier, we identified proteins connecting different disease proteins in the human protein-protein interaction network and quantified their mediator role. An analysis of the networks of these ... [more ▼] Background: Earlier, we identified proteins connecting different disease proteins in the human protein-protein interaction network and quantified their mediator role. An analysis of the networks of these mediators shows that proteins connecting heart disease and diabetes largely overlap with the ones connecting heart disease and obesity. Results: We quantified their overlap, and based on the identified topological patterns, we inferred the structural disease-relatedness of several proteins. Literature data provide a functional look of them, well supporting our findings. For example, the inferred structurally important role of the PDZ domain-containing protein GIPC1 in diabetes is supported despite the lack of this information in the Online Mendelian Inheritance in Man database. Several key mediator proteins identified here clearly has pleiotropic effects, supported by ample evidence for their general but always of only secondary importance. Conclusions: We suggest that studying central nodes in mediator networks may contribute to better understanding and quantifying pleiotropy. Network analysis provides potentially useful tools here, as well as helps in improving databases. [less ▲] Detailed reference viewed: 169 (1 UL) |
||