Machine Learning to Support the Presentation of Complex Pathway Graphs.Nielsen, Sune Steinbjorn ; Ostaszewski, Marek ; et alin IEEE/ACM transactions on computational biology and bioinformatics (2019) Visualization of biological mechanisms by means of pathway graphs is necessary to better understand the often complex underlying system. Manual layout of such pathways or maps of knowledge is a difficult ... [more ▼] Visualization of biological mechanisms by means of pathway graphs is necessary to better understand the often complex underlying system. Manual layout of such pathways or maps of knowledge is a difficult and time consuming process. Node duplication is a technique that makes layouts with improved readability possible by reducing edge crossings and shortening edge lengths in drawn diagrams. In this article we propose an approach using Machine Learning (ML) to facilitate parts of this task by training a Support Vector Machine (SVM) with actions taken during manual biocuration. Our training input is a series of incremental snapshots of a diagram describing mechanisms of a disease, progressively curated by a human expert employing node duplication in the process. As a test of the trained SVM models, they are applied to a single large instance and 25 medium-sized instances of hand-curated biological pathways. Finally, in a user validation study, we compare the model predictions to the outcome of a node duplication questionnaire answered by users of biological pathways with varying experience. We successfully predicted nodes for duplication and emulated human choices, demonstrating that our approach can effectively learn human-like node duplication preferences to support curation of pathway diagrams in various contexts. [less ▲] Detailed reference viewed: 228 (4 UL) Diversity Preserving Genetic Algorithms - Application to the Inverted Folding Problem and Analogous Formulated BenchmarksNielsen, Sune Steinbjorn ![]() Doctoral thesis (2016) Protein structure prediction is an essential step in understanding the molecular mechanisms of living cells with widespread applications in biotechnology and health. Among the open problems in the field ... [more ▼] Protein structure prediction is an essential step in understanding the molecular mechanisms of living cells with widespread applications in biotechnology and health. Among the open problems in the field, the Inverse Folding Problem (IFP) that consists in finding sequences that fold into a defined structure is, in itself, an important research problem at the heart of most rational protein design approaches. In brief, solutions to the IFP are protein sequences that will fold into a given protein structure, contrary to conventional structure prediction where the solution consists of the structure into which a given sequence folds. This inverse approach is viewed as a simplification due to the fact that the near infinite number of structure conformations of a protein can be disregarded, and only sequence to structure compatibility needs to be determined. Additional emphasis has been put on the generation of many sequences dissimilar from the known reference sequence instead of finding only one solution. To solve the IFP computationally, a novel formulation of the problem was proposed in which possible problem solutions are evaluated in terms of their predicted secondary structure match. In addition, two specialised Genetic Algorithms (GAs) were developed specifically for solving the IFP problem and compared with existing algorithms in terms of performance. Experimental results outlined the superior performance of the developed algorithms, both in terms of model score and diversity of the generated sets of problem solutions, i.e. new protein sequences. A number of landscape analysis experiments were conducted on the IFP model, enabling the development of an original benchmark suite of analogous problems. These benchmarks were shown to share many characteristics with their IFP model counterparts, but are executable in a fraction of the time. To validate the IFP model and the algorithm output, a subset of the generated solutions were selected for further inspection through full tertiary structure prediction and comparison to the original protein structure. Congruence was then assessed by super-positioning and secondary structure annotation statistics. The results demonstrated that an optimisation process relying on a fast secondary structure approximation, such as the IFP model, permits to obtain meaningful sequences. [less ▲] Detailed reference viewed: 233 (21 UL) Tackling the IFP Problem with the Preference-Based Genetic AlgorithmNielsen, Sune Steinbjorn ; Ferreira Torres, Christof ; Danoy, Grégoire et alin Proceedings of the Genetic and Evolutionary Computation Conference 2016 (2016) Detailed reference viewed: 473 (34 UL) Preference-Based Genetic Algorithm for Solving the Bio-Inspired NK Landscape BenchmarkFerreira Torres, Christof ; Nielsen, Sune Steinbjorn ; Danoy, Grégoire et alin 7th European Symposium on Computational Intelligence and Mathematics (ESCIM) (2015, October) Detailed reference viewed: 254 (31 UL) A Novel Multi-objectivisation Approach for Optimising the Protein Inverse Folding ProblemNielsen, Sune Steinbjorn ; Danoy, Grégoire ; et alin Applications of Evolutionary Computation: 18th European Conference, EvoApplications 2015, Copenhagen, Denmark, April 8-10, 2015, Proceedings (2015) In biology, the subject of protein structure prediction is of continued interest, not only to chart the molecular map of the living cell, but also to design proteins of new functions. The Inverse Folding ... [more ▼] In biology, the subject of protein structure prediction is of continued interest, not only to chart the molecular map of the living cell, but also to design proteins of new functions. The Inverse Folding Problem (IFP) is in itself an important research problem, but also at the heart of most rational protein design approaches. In brief, the IFP consists in finding sequences that will fold into a given structure, rather than determining the structure for a given sequence - as in conventional structure prediction. In this work we present a Multi Objective Genetic Algorithm (MOGA) using the diversity-as-objective (DAO) variant of multi-objectivisation, to optimise secondary structure similarity and sequence diversity at the same time, hence pushing the search farther into wide-spread areas of the sequence solution-space. To control the high diversity generated by the DAO approach, we add a novel Quantile Constraint (QC) mechanism to discard an adjustable worst quantile of the population. This DAO-QC approach can efficiently emphasise exploitation rather than exploration to a selectable degree achieving a trade-off producing both better and more diverse sequences than the standard Genetic Algorithm (GA). To validate the final results, a subset of the best sequences was selected for tertiary structure prediction. The super-positioning with the original protein structure demonstrated that meaningful sequences are generated underlining the potential of this work. [less ▲] Detailed reference viewed: 303 (9 UL) An NK Landscape Based Model Mimicking the Protein Inverse Folding ProblemNielsen, Sune Steinbjorn ; Danoy, Grégoire ; et alin 27th European Conference on Operational Research (EURO) (2015) Detailed reference viewed: 189 (19 UL) NK Landscape Instances Mimicking the Protein Inverse Folding Problem Towards Future BenchmarksNielsen, Sune Steinbjorn ; Danoy, Grégoire ; Bouvry, Pascal et alin GECCO Companion '15 Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation (2015) This paper introduces two new nominal NK Landscape model instances designed to mimic the properties of one challenging optimisation problem from biology: the Inverse Folding Problem (IFP), here focusing ... [more ▼] This paper introduces two new nominal NK Landscape model instances designed to mimic the properties of one challenging optimisation problem from biology: the Inverse Folding Problem (IFP), here focusing on a simpler secondary structure version. Through landscape analysis tests, numerous problem properties are identified and used to parameterise and validate model instances in terms of epistatic links, adaptive- and random walk characteristics. Then the performance of different Genetic Algorithms (GAs) is compared on both the new NK Models and the original IFP, in terms of population diversity, solution quality and convergence characteristics. It is demonstrated that very similar properties are captured in all presented tests with a significantly faster evaluation time compared to the real IFP. The future purpose of such a model is to provide a generic benchmark for algorithms targeting protein sequence optimisation, specifically in protein design. It may also provide the foundation for more in-depth studies of the size, shape and characteristics of the solution space of good solutions to the IFP. [less ▲] Detailed reference viewed: 236 (16 UL) Evolutionary Multi-Objective Optimisation with Quantile Constraint for the Protein Structure Similarity ProblemNielsen, Sune Steinbjorn ; Jurkowski, Wiktor ; Danoy, Grégoire et alScientific Conference (2014) Detailed reference viewed: 407 (33 UL) Evolutionary multi objective optimisation with diversity as objective for the protein structure similarity problemNielsen, Sune Steinbjorn ; Jurkowski, Wiktor ; Danoy, Grégoire et alScientific Conference (2014) Detailed reference viewed: 308 (26 UL) Cooperative Selection: Improving Tournament Selection via AltruismJimenez Laredo, Juan Luis ; Nielsen, Sune Steinbjorn ; Danoy, Grégoire et alin The 14th European Conference on Evolutionary Computation in Combinatorial Optimisation (2014) This paper analyzes the dynamics of a new selection scheme based on altruistic cooperation between individuals. The scheme, which we refer to as cooperative selection, extends from tournament selection ... [more ▼] This paper analyzes the dynamics of a new selection scheme based on altruistic cooperation between individuals. The scheme, which we refer to as cooperative selection, extends from tournament selection and imposes a stringent restriction on the mating chances of an individual during its lifespan: winning a tournament entails a depreciation of its fitness value. We show that altruism minimizes the loss of genetic diversity while increasing the selection frequency of the fittest individuals. An additional contribution of this paper is the formulation of a new combinatorial problem for maximizing the similarity of proteins based on their secondary structure. We conduct experiments on this problem in order to validate cooperative selection. The new selection scheme outperforms tournament selection for any setting of the parameters and is the best trade-off, maximizing genetic diversity and minimizing computational efforts. [less ▲] Detailed reference viewed: 350 (35 UL) Vehicular mobility model optimization using cooperative coevolutionary genetic algorithmsNielsen, Sune Steinbjorn ; Danoy, Grégoire ; Bouvry, Pascal ![]() in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '13) (2013) Detailed reference viewed: 234 (15 UL) Novel efficient asynchronous cooperative co-evolutionary multi-objective algorithmsNielsen, Sune Steinbjorn ; Dorronsoro, Bernabé ; Danoy, Grégoire et alin Congress on Evolutionary Computation (CEC) (2012) Detailed reference viewed: 305 (16 UL) |
||