Dhaval Soni

and 7 more

Drug design is undergoing a transformation as we challenge conventional methods by integrating state-of-the-art artificial intelligence with the intricate domain of molecular biology. At the heart of our endeavor lies a significant challenge: the scarcity of datasets containing active compounds for emerging target proteins. To confront this obstacle, we're pioneering an innovative approach. We're merging the advanced Generative Pre-trained Transformer (GPT) architecture with the nuanced capabilities of Long Short-Term Memory (LSTM) networks, with the aim of generating Simplified Molecular Input Line Entry System (SMILES) strings to unveil novel therapeutic pathways. Additionally, we're employing a Bidirectional Encoder Representations from Transformers (BERT) pretraining strategy to enrich our model with comprehensive molecular data, including amino acid sequences and molecular SMILES datasets. Through meticulous fine-tuning on a meticulously curated protein-ligand complex dataset, we're achieving precise conditional generation via autoregressive supervised learning. Our research introduces a groundbreaking method to assess molecular affinity, validated against established proteins, showcasing superior binding affinities compared to certain FDA-approved drugs in docking experiments. By pushing the boundaries of generative algorithms and establishing a robust framework for evaluating molecular affinity, we're driving forward the field of de novo drug design, offering promising therapeutic avenues and enabling deeper exploration of the chemical landscape.