Efficiently predicting drug response remains a challenge in the realm of drug discovery. By leveraging two types of multiomics data, transcriptomics, and genomics, we create a comprehensive representation of target cells and enable drug response prediction for personalized medicine. To address this issue, we propose four novel model architectures that combine graph transformers with varying positions of multiheaded self-attention mechanisms. A majority of our architectures utilize multiple transformer models, one with a graph attention mechanism and the other with a multiheaded self-attention mechanism, to generate latent representations of both the drug and omics data, respectively. Unlike previous approaches that apply attention only to one data type, either drug or genomics, our model architectures employ this technique for both, with a goal to procure more comprehensive latent representations. The latent representations are then concatenated and input into a fully connected network to predict the IC-50 score, a measure of cell drug response. We experiment with all four of these architectures and extract results from them. The novel model without the multiheaded self-attention mechanism seems to give us the most accurate results on our holdout set. Our study greatly contributes to the future of drug discovery and precision medicine by looking to optimize the time and accuracy of drug response prediction, as well as using multiomics data for a personalized approach to the problem.
There has been a lot of research done into the implementation of neural networks in the bioinformatics space, specifically with respect to drug discovery. Although there have been many promising steps taken in this direction, there is still a large amount of research yet to be done in this field. In this paper, we design a novel architecture that aims to generate novel molecules that will treat hormone-receptor-positive breast cancer disease. These molecules are aimed to inhibit aromatase, CDK4, CDK6, PI3K, and mTOR proteins. To do this, we used a natural language processor based variational autoencoder. Our model is trained on the ZINC open-source dataset due to its library of 250k drug molecules. To generate our molecules we compiled a test set of about 68 molecules that were already proven to bind to our mentioned target proteins. To measure the initial viability of our generated molecules we used RDKit’s quantitative estimated drug-likeness score, which will help provide insight into the drug-likeness of our generated data. Supplementary models helped predict other properties of our generated molecules, specifically solubility, synthetic accessibility, and toxicity to further heighten our screening process. We used the AutoDock Vina framework to predict the Gibbs Free Energy Score between the molecule and the desired target enzymes. Our experimentation was able to expand and improve upon a previous solubility prediction architecture to procure more accurate results on both solubility and synthetic accessibility of molecules. The goal of our research is to develop a novel high-throughput process to generate and screen for hormone-positive breast cancer drug molecules that can be feasible in the real world. Since the drug discovery space is so large (approximately 1060 molecules), neural networks are a valuable tool to help cut down the time and cost it takes to find these molecules. Through our experimentation, we were able to add a novel improvement to a working VAE framework by refining certain layers of the network’s decoder, leading to the generation of three molecules that passed our screening process and have high viability to be successful in suppressing hormone-positive breast cancer tumor growth.