Devarakonda Himaja

and 1 more

Experimental characterization of amino acid functions of Domains of Unknown Function (DUF) proteins is expensive, and time-consuming which could be complemented by computational methods. Cysteine, being the second most reactive amino acid at the catalytic sites of enzymes, was selected for functional annotation and characterization on DUF proteins. Earlier we reported functional annotation of Cysteine on DUF proteins belonging to the COX-II family. However, holistic characterization of Cysteine functions on DUF proteins was not known, to the best of our knowledge. Here, we annotated and characterized Cysteine post-translational modifications (PTMs) based on biochemical pathways, diseases, taxonomy, and protein microenvironment. The information on uncharacterized DUF proteins was initially obtained from the literature and the sequence, structure, pathways, taxonomy, and disease information were retrieved from the SCOP database using DUF IDs. Protein microenvironments (MENV) around Cysteine were computed using protein structures. The Cysteine PTMs were predicted using the in-house Cysteine-function prediction server, DeepCys (https:/deepcys.bits-hyderabad.ac.in). The information was consolidated in the database ([http://cysduf.bits-hyderabad.ac.in/](http://cysduf.bits-hyderabad.ac.in/) ), retrievable in downloadable formats (CSV, JSON, or TXT) using the following inputs, DUF ID, PFAM ID, or PDB ID. For the first time, we annotated Cysteine PTMs in DUF proteins belonging to seven different biochemical pathways. For the first time, the Cysteine PTMs in DUF proteins were elucidated in Viruses, namely, SARS-COV2. The nature of MENV around Cysteine from DUF proteins (reported for the first time) was mainly buried and hydrophobic in nature; however, in viruses, a significant number of Cysteine residues were embedded in the exposed and hydrophilic microenvironment.

Syeda Lubna

and 1 more

Histidine (His) is the most reactive amino acid at enzyme active sites. Multiple post-translational modifications (functions) are reported for His side chains. The high-throughput sequencing techniques produce a large number of protein sequences without functional annotations at the amino acid level. Experimental characterization of His functions in proteins is laborious and time-consuming. Computational characterization based on protein sequences may complement the need. There are only a handful of Histidine function prediction tools available and those annotate only a single function. Here we curated a dataset of active Histidine with known functions based on protein sequences obtained from UniProt database (sample size n=1584) and trained against four machine learning methods. The convolution neural network (CNN) model (“ Hist-i-fy”) performed the best with 75% overall accuracy. The external validation of Hist-i-fy on phosphorylated histidine data (sample size 34) showed 94.1% prediction accuracy. For the first time, we report multiple His function prediction, based on protein sequences using deep neural networks. The inputs to the model are i) protein sequence containing His, and ii) the His residue number. The model predicts one out of the eight histidine functions, namely, acetylation, ribosylation, glycosylation, hydroxylation, methylation, oxidation, phosphorylation, and protein splicing. The novelty of the work is, it predicts maximum number of histidine functions at a time with optimal performance. There is a scope of improvement in the model upon availability of a larger dataset. The model is available as a web application ([https://histify.streamlit.app/](https://histify.streamlit.app/)) and a stand-alone code [https://github.com/dibyansu24-maker/Histify](https://github.com/dibyansu24-maker/Histify)).

Lubna Syeda

and 5 more

The routine influenza (H1N1) surveillance in India started almost a decade ago. The fluctuation in the number of deaths and cases in different Indian states over the last decade presumably indicated the possible changes in the viral sequence and in the immune response of the host. To track these changes, we have chosen NS1 protein that invades host antiviral immune response. Objective of this study was to identify the recent mutations on NS1 protein from Indian isolates. The sequences of NS1 proteins from H1N1 strains isolated in India over a decade were obtained from publicly available databases. Multiple sequence alignment, phylogeny and surface hydrophilicity analyses were performed to confirm the consistent mutations on NS1 protein, evolved chronologically in India. Total eight mutations were identified, two in RNA-binding domain (RBD), five in effector domain (ED) and one in the linker region. Three mutations were reported first time in this study at the sequence positions, 2, 80 and 155; those evolved either in 2017 or in 2019. These recent mutations were associated with conservative substitutions in the alternative domains of NS1 protein, namely, i) D2E and E55D, ii) T80A and A155T and iii) E55K and K131E. A gradual shift of NS1 antigenic regions (surface hydrophilicity) was observed from ED to RBD domains along the time line. The possible consequences of these mutations on host-pathogen interactions were hypothesized based on the sequence positions of NS1 mutations belonging to various cellular-binding sub-domains. The hypothesis is subject to further experimental and computational verification.
Cysteine (Cys) is the most reactive amino acid participating in a wide range of biological functions. In-silico predictions complement the experiments to meet the need of functional characterization. Multiple Cys function prediction algorithm is scarce, in contrast to specific function prediction algorithms. Here we present a deep neural network-based multiple Cys function prediction, available on web-server (DeepCys) (https://deepcys.herokuapp.com/). DeepCys model was trained and tested on two independent datasets curated from protein crystal structures. This prediction method requires three inputs, namely, PDB identifier (ID), chain ID and residue ID for a given Cys and outputs the probabilities of four cysteine functions, namely, disulphide, metal-binding, thioether and sulphenylation and predicts the most probable Cys function. The algorithm exploits the local and global protein properties, like, sequence and secondary structure motifs, buried fractions, microenvironments and protein/enzyme class. DeepCys outperformed most of the multiple and specific Cys function algorithms. This method can predict maximum number of cysteine functions. Moreover, for the first time, explicitly predicts thioether function. This tool was used to elucidate the cysteine functions on domains of unknown functions (DUFs) belonging to cytochrome C oxidase subunit-II (COX2) like transmembrane domains. Apart from the web-server, a standalone program is also available on GitHub (https://github.com/vam-sin/deepcys)