Organic/Inorganic/Materials Seminar – Renana Gershoni Poranne , February 7th

event flyer

Department of Chemistry and Biochemistry
Organic/Inorganic/Materials Seminar Series

Professor Professor Renana Gershoni Poranne, Technion
February 7, 2025
3:00 pm, WIL 110
Hosted by Mike Haley

Data are a Girl’s Best Friend: From High-Throughput Computation to Generative Deep Learning

Chemical databases are an essential tool for data-driven investigation of structure-property relationships and design of novel functional compounds, and they are the crucial foundation for machine- and deep-learning techniques, which efficiently map the chemical space and allow discovery of new molecular motifs of molecules and materials for various uses. However, there is a lack of suitable databases of polycyclic aromatic systems (PASs).

To enable the application of such techniques to the design of novel functional PASs, we established the COMPAS Project — a COMputational database of Polycyclic Aromatic Systems. This new database already contains over 500k molecules in three datasets: cata-condensed polybenzenoid hydrocarbons (COMPAS-1),1 cata-condensed hetero-PASs (COMPAS-2),2 and peri-condensed polybenzenoid hydrocarbons (COMPAS-3).3

With this new data in hand, we demonstrate the first examples of interpretable learning models in the chemical space of PASs. To this end, we developed two types of molecular representation to enable efficient and effective machine- and deep-learning models to train on the new data: a) a text-based representation4 and b) a graph-based representation.5 Our dedicated representations not only achieve higher predictive ability with fewer data, but are also amenable to interpretation – thus allowing the extraction of chemical insight from the model.

Using the COMPAS database and our dedicated representations, we implemented the first guided diffused-based model for inverse design of PASs: GaUDI.6 Our model generates new PASs with defined target properties. In addition to its flexible target function and high validity scores, GaUDI also accomplishes design of molecules with properties beyond the distribution of the training data.


References

(1)  Wahab, A.; Pfuderer, L.; Paenurk, E.; Gershoni-Poranne, R. The COMPAS Project: A Computational Database of Polycyclic Aromatic Systems. Phase 1: Cata-Condensed Polybenzenoid Hydrocarbons. J. Chem. Inf. Model. 2022, 62 (16), 3704.

(2)  Mayo Yanes, E.; Chakraborty, S.; Gershoni-Poranne, R. COMPAS-2: A Dataset of Cata-Condensed Hetero-Polycyclic Aromatic Systems. Sci. Data 2024, 11 (1), 97.

(3)  Wahab, A.; Gershoni-Poranne, R. COMPAS-3: A Data Set of Peri-Condensed Polybenzenoid Hydrocarbons. ChemRxiv February 26, 2024.

(4)  Fite, S.; Wahab, A.; Paenurk, E.; Gross, Z.; Gershoni-Poranne, R. Text-Based Representations with Interpretable Machine Learning Reveal Structure-Property Relationships of Polybenzenoid Hydrocarbons. J. Phys. Org. Chem. 2022, e4458.

(5)  Weiss, T.; Wahab, A.; Bronstein, A. M.; Gershoni-Poranne, R. Interpretable Deep-Learning Unveils Structure–Property Relationships in Polybenzenoid Hydrocarbons. J. Org. Chem. 2023, 88 (14), 9645–9656. https://doi.org/10.1021/acs.joc.2c02381.

(6)  Weiss, T.; Mayo Yanes, E.; Chakraborty, S.; Cosmo, L.; Bronstein, A. M.; Gershoni-Poranne, R. Guided Diffusion for Inverse Molecular Design. Nat. Comput. Sci. 2023, 3 (10), 873–882. https://doi.org/10.1038/s43588-023-00532-0.


 

Leave a Reply

Your email address will not be published. Required fields are marked *