Algorithmique de graphes pour l’aide à la décision dans la construction moléculaire
Axis : DataSense - Making Sense of Complex and heterogeneous data
Subject : Algorithmique de graphes pour l’aide à la décision dans la construction
Directors : Dominique Barth, PRISM et Marc Antoine Wesser, LRI
Institution : David, LRI
Administrator laboratory : David
PhD sTUDENT: Stefi Nouleho
Début : 2016
Scientific Production : Awaiting validation
Numerous reaction databases are available to chemists and biochemists. They can be considered as graphs or hypergraphs in which the vertices are molecules and the links (or hyperlinks) represent the reactions allowing these molecules. Given a target molecule, which is not found in the base or bases considered, the objective is to identify the reaction path that could govern its synthesis from existing elements. The objective of this thesis topic in computer science is therefore to design decision support algorithms for the prediction of reaction paths, algorithms based on adapted notions of graph similarities. It will therefore be a question of defining the most relevant notions of similarities (a Master 2 course funded by Labex CHARMMMAT will initiate this study), of studying the theoretical and complexity aspects, of proposing evaluation algorithms and finally of proposing decision support algorithms designed in conjunction with partner chemists. Positioning in relation to the state of the art and national and international competition Different computer approaches are currently proposed to chemists (especially organic chemists) for the prediction of molecular reactions, with two main points of view, one based on the modelling and simulation of the physico-chemical behaviour of reactions, the other on the search for maximum common subgraphs between the target molecule and the molecules stored in the database. In terms of graphs, the solutions that are considered in the preparatory work for this project could consider for this example the generic skeleton of the figure, and the selected molecules insisted on molecules whose skeletons were sufficiently similar (but not necessarily identical) to this generic skeleton target.
Scientific objective :
The final objective of this project is the design of decision support software for molecular construction. To do this, we consider graphs, called reaction graphs, obtained by union of different reaction chains recorded in databases (public or commercial) as Reaxys. These are actually "graphs of graphs" (or "graph hypergraphs"), i.e. graphs in which the vertices are molecules (whose structures are also modeled by graphs) and the arcs of reactions between these molecules. From an algorithmic point of view, the overall scientific project consists, given a new target molecule, in predicting the (or possible place(s) in an existing reaction graph. The starting paradigm is that if two molecules are close (especially linked) in this graph, then they have common or similar graph substructures. We can hear here of course the notion of common substructures in the sense of the isomorphism of partial graphs, but also notions already envisaged within the first research projects carried out in CHARMMMAT as the common or close miners, similarities of the decomposition in cycles, or properties of the groups of automorphisms. One of the specificities of this thesis topic is that these notions of similarities are to be considered at graph granularity levels (from the complete structure to a generic skeleton, see figure above). The chemist's expertise is that this level of granularity depends on the functional interest or not of each element of the molecule, and that the level of granularity envisaged will not be homogeneous over the whole molecule. Finally, each reaction will often result in a minimal cut (or part of a cut) in the graph considered, which also induces choices in the skeleton finally chosen. The first objective of the thesis will be to evaluate, on target molecules in the target reaction graphs, the relationship between the distance between two molecules in this graph and their similarity according to these different criteria, to identify (or even redefine) the best adapted, with the granularity levels of most relevant graphs. It will then be a question of studying the structural properties linked to these parameters, the complexity of their calculation (taking into account the characteristics of the graphs considered) and finally of proposing evaluation algorithms which will be at the heart of the prediction solutions which will be proposed. These solutions will operate either directly on the graphs when the databases allow it, or using access functions provided with commercial databases (like Reaxys). The validation of the proposed approaches will be made throughout the thesis by a close collaboration with a team of chemists from the ILV of Versailles, with whom we prepared this thesis proposal,
as part of CHARMMMAT.