![]() Welter, Danielle ![]() ![]() ![]() Poster (2020, November 27) When the COVID-19 pandemic hit in early 2020, a lot of research efforts were quickly redirected towards studies on SARS-CoV2 and COVID-19 disease, from the sequencing and assembly of viral genomes to the ... [more ▼] When the COVID-19 pandemic hit in early 2020, a lot of research efforts were quickly redirected towards studies on SARS-CoV2 and COVID-19 disease, from the sequencing and assembly of viral genomes to the elaboration of robust testing methodologies and the development of treatment and vaccination strategies. At the same time, a flurry of scientific publications around SARS-CoV-2 and COVID-19 began to appear, making it increasingly difficult for researchers to stay up-to-date with latest trends and developments in this rapidly evolving field. The BioKB platform is a pipeline which, by exploiting text mining and semantic technologies, helps researchers easily access semantic content of thousands of abstracts and full text articles. The content of the articles is analysed and concepts from a range of contexts, including proteins, species, chemicals, diseases and biological processes are tagged based on existing dictionaries of controlled terms. Co-occurring concepts are classified based on their asserted relationship and the resulting subject-relation-object triples are stored in a publicly accessible human- and machine-readable knowledge base. All concepts in the BioKB dictionaries are linked to stable, persistent identifiers, either a resource accession such as an Ensembl, Uniprot or PubChem ID for genes, proteins and chemicals, or an ontology term ID for diseases, phenotypes and other ontology terms. In order to improve COVID-19 related text mining, we extended the underlying dictionaries to include many additional viral species (via NCBI Taxonomy identifiers), phenotypes from the Human Phenotype Ontology (HPO), COVID-related concepts including clinical and laboratory tests from the COVID-19 ontology, as well as additional diseases (DO) and biological processes (GO). We also added all viral proteins found in UniProt and gene entries from EntrezGene to increase the sensitivity of the text mining pipeline to viral data. To date, BioKB has indexed over 270’000 sentences from 21’935 publications relating to coronavirus infections, with publications dating from 1963 to 2021, 3’863 of which were published this year. We are currently working to further refine the text mining pipeline by training it on the extraction of increasingly complex relations such as protein-phenotype relationships. We are also regularly adding new terms to our dictionaries for areas where coverage is currently low, such as clinical and laboratory tests and procedures and novel drug treatments. [less ▲] Detailed reference viewed: 155 (17 UL)![]() ![]() Biryukov, Maria ![]() ![]() Scientific Conference (2020, July) Detailed reference viewed: 72 (5 UL)![]() Andersen, Eva ![]() ![]() in Journal of Data Mining and Digital Humanities (2020) Historians are confronted with an overabundance of sources that require new perspectives and tools to make use of large-scale corpora. Based on a use case from the history of psychiatry this paper ... [more ▼] Historians are confronted with an overabundance of sources that require new perspectives and tools to make use of large-scale corpora. Based on a use case from the history of psychiatry this paper describes the work of an interdisciplinary team to tackle these challenges by combining different NLP tools with new visual interfaces that foster the exploration of the corpus. The paper highlights several research challenges in the preparation and processing of the corpus and sketches new insights for historical research that were gathered due to the use of the tools. [less ▲] Detailed reference viewed: 101 (9 UL)![]() Boussaad, Ibrahim ![]() in Science translational medicine (2020), 12(560), Parkinson's disease (PD) is a heterogeneous neurodegenerative disorder with monogenic forms representing prototypes of the underlying molecular pathology and reproducing to variable degrees the sporadic ... [more ▼] Parkinson's disease (PD) is a heterogeneous neurodegenerative disorder with monogenic forms representing prototypes of the underlying molecular pathology and reproducing to variable degrees the sporadic forms of the disease. Using a patient-based in vitro model of PARK7-linked PD, we identified a U1-dependent splicing defect causing a drastic reduction in DJ-1 protein and, consequently, mitochondrial dysfunction. Targeting defective exon skipping with genetically engineered U1-snRNA recovered DJ-1 protein expression in neuronal precursor cells and differentiated neurons. After prioritization of candidate drugs, we identified and validated a combinatorial treatment with the small-molecule compounds rectifier of aberrant splicing (RECTAS) and phenylbutyric acid, which restored DJ-1 protein and mitochondrial dysfunction in patient-derived fibroblasts as well as dopaminergic neuronal cell loss in mutant midbrain organoids. Our analysis of a large number of exomes revealed that U1 splice-site mutations were enriched in sporadic PD patients. Therefore, our study suggests an alternative strategy to restore cellular abnormalities in in vitro models of PD and provides a proof of concept for neuroprotection based on precision medicine strategies in PD. [less ▲] Detailed reference viewed: 243 (29 UL)![]() ![]() Biryukov, Maria ![]() ![]() ![]() Scientific Conference (2019, September 12) Detailed reference viewed: 143 (17 UL)![]() Alex Namasivayam, Aishwarya ![]() in Gene regulation and systems biology (2016), 10 Biological network models offer a framework for understanding disease by describing the relationships between the mechanisms involved in the regulation of biological processes. Crowdsourcing can ... [more ▼] Biological network models offer a framework for understanding disease by describing the relationships between the mechanisms involved in the regulation of biological processes. Crowdsourcing can efficiently gather feedback from a wide audience with varying expertise. In the Network Verification Challenge, scientists verified and enhanced a set of 46 biological networks relevant to lung and chronic obstructive pulmonary disease. The networks were built using Biological Expression Language and contain detailed information for each node and edge, including supporting evidence from the literature. Network scoring of public transcriptomics data inferred perturbation of a subset of mechanisms and networks that matched the measured outcomes. These results, based on a computable network approach, can be used to identify novel mechanisms activated in disease, quantitatively compare different treatments and time points, and allow for assessment of data with low signal. These networks are periodically verified by the crowd to maintain an up-to-date suite of networks for toxicology and drug discovery applications. [less ▲] Detailed reference viewed: 49 (0 UL)![]() Biryukov, Maria ![]() ![]() ![]() in Lausen, Berthold; Krolak-Schwerdt, Sabine; Böhmer, Matthias (Eds.) Data Science, Learning by Latent Structures, and Knowledge Discovery (2015, February 20) Cell lines are widely used in translational biomedical research to study the genetic basis of diseases. A major approach for experimental disease modeling are genetic perturbation experiments that aim to ... [more ▼] Cell lines are widely used in translational biomedical research to study the genetic basis of diseases. A major approach for experimental disease modeling are genetic perturbation experiments that aim to trigger selected cellular disease states. In this type of experiments it is crucial to ensure that the targeted disease- related genes and pathways are intact in the used cell line. In this work we are developing a framework which integrates genetic sequence information and disease- specific network analysis for evaluating disease-specific cell line suitability. [less ▲] Detailed reference viewed: 247 (23 UL)![]() Krishna, Abhimanyu ![]() ![]() ![]() in BMC Genomics (2014), 15(1154), Background: The human neuroblastoma cell line, SH-SY5Y, is a commonly used cell line in studies related to neurotoxicity, oxidative stress, and neurodegenerative diseases. Although this cell line is often ... [more ▼] Background: The human neuroblastoma cell line, SH-SY5Y, is a commonly used cell line in studies related to neurotoxicity, oxidative stress, and neurodegenerative diseases. Although this cell line is often used as a cellular model for Parkinson’s disease, the relevance of this cellular model in the context of Parkinson’s disease (PD) and other neurodegenerative diseases has not yet been systematically evaluated. Results: We have used a systems genomics approach to characterize the SH-SY5Y cell line using whole-genome sequencing to determine the genetic content of the cell line and used transcriptomics and proteomics data to determine molecular correlations. Further, we integrated genomic variants using a network analysis approach to evaluate the suitability of the SH-SY5Y cell line for perturbation experiments in the context of neurodegenerative diseases, including PD. Conclusions: The systems genomics approach showed consistency across different biological levels (DNA, RNA and protein concentrations). Most of the genes belonging to the major Parkinson’s disease pathways and modules were intact in the SH-SY5Y genome. Specifically, each analysed gene related to PD has at least one intact copy in SH-SY5Y. The disease-specific network analysis approach ranked the genetic integrity of SH-SY5Y as higher for PD than for Alzheimer’s disease but lower than for Huntington’s disease and Amyotrophic Lateral Sclerosis for loss of function perturbation experiments. [less ▲] Detailed reference viewed: 320 (25 UL)![]() Ostaszewski, Marek ![]() Poster (2013, March 09) Objectives: The pathogenesis of Parkinson's Disease (PD) is multi-factorial and age-related, implicating various genetic and environmental factors. It becomes increasingly important to develop new ... [more ▼] Objectives: The pathogenesis of Parkinson's Disease (PD) is multi-factorial and age-related, implicating various genetic and environmental factors. It becomes increasingly important to develop new approaches to organize and explore the exploding knowledge of this field. Methods: The published knowledge on pathways implicated in PD, such as synaptic and mitochondrial dysfunction, alpha-synuclein pathobiology, failure of protein degradation systems and neuroinflammation has been organized and represented using CellDesigner. This repository has been linked to a framework of bioinformatics tools including text mining, database annotation, large-scale data integration and network analysis. The interface for online curation of the repository has been established using Payao tool. Results: We present the PD map, a computer-based knowledge repository, which includes molecular mechanisms of PD in a visually structured and standardized way. A bioinformatics framework that facilitates in-depth knowledge exploration, extraction and curation supports the map. We discuss the insights gained from PD map-driven text mining of a corpus of over 50 thousands full text PD-related papers, integration and visualization of gene expression in post mortem brain tissue of PD patients with the map, as well as results of network analysis. Conclusions: The knowledge repository of disease-related mechanisms provides a global insight into relationships between different pathways and allows considering a given pathology in a broad context. Enrichment with available text and bioinformatics databases as well as integration of experimental data supports better understanding of complex mechanisms of PD and formulation of novel research hypotheses. [less ▲] Detailed reference viewed: 604 (72 UL)![]() ; Glaab, Enrico ![]() ![]() in Prokop, Ales; Csukás, Bela (Eds.) Systems Biology: Integrative Biology and Simulation Tools (2013) This chapter introduces systems biology, its context, aims, concepts and strategies. It then describes approaches and methods used for collection of high-dimensional structural and functional genomics ... [more ▼] This chapter introduces systems biology, its context, aims, concepts and strategies. It then describes approaches and methods used for collection of high-dimensional structural and functional genomics data, including epigenomics, transcriptomics, proteomics, metabolomics and lipidomics, and how recent technological advances in these fields have moved the bottleneck from data production to data analysis and bioinformatics. Finally, the most advanced mathematical and computational methods used for clustering, feature selection, prediction analysis, text mining and pathway analysis in functional genomics and systems biology are reviewed and discussed in the context of use cases. [less ▲] Detailed reference viewed: 642 (49 UL)![]() Biryukov, Maria ![]() Doctoral thesis (2010) Due to intensive growth of the electronically available publications, bibliographic databases have become widespread. They cover a large variety of knowledge fields and provide a fast access to the wide ... [more ▼] Due to intensive growth of the electronically available publications, bibliographic databases have become widespread. They cover a large variety of knowledge fields and provide a fast access to the wide variety of data. At the same time they contain a wealth of hidden knowledge that requires steps of extra processing in order to infer it. In this work we focus on extraction of such meta knowledge from the research bibliographic databases by looking at them from sociolinguistic, text mining and bibliometric perspectives. We choose the Digital Library and Bibliographic Database as a testbed for our experiments. In the framework of the sociolinguistic analysis we build a statistical system for the language identification of personal names. We show also that extension of a purely statistical model with the co-authors network boosts the system's performance. In the text mining scenario, we perform a number of experiments that focus on topic identification and ranking. While our topic detection approach remains generic and can be used for any kind of textual data, the topic ranking metrics are built upon the information provided by the bibliographic databases. The goal of our bibliometric study is to create a researcher's profile on DBLP and analyze some of the research communities defined by the different conferences, in terms of the publication activity, interdisciplinarity of research, collaboration trends and population stability. We also aim at exploring to what extent these aspects correlate with the conference rank. Each of the above topics constitutes a method of meta information extraction from bibliographic databases and other similarly structured data sources. [less ▲] Detailed reference viewed: 360 (104 UL) |
||