Annotation and curation of hypothetical proteins: prioritizing targets for experimental study

Muhammad Naveed, Zoma Chaudhry, Zeeshan Ali, Mahnoor Amjad, Fizza zulfiqar, Ali numan


Completely sequenced organisms have some uncharacterized proteins that are gene-encoded products. These proteins can be predicted through in-silico approaches and their biological activities are not proved by experimental evidence and known as hypothetical proteins (HPs). These proteins are important due to their excessive involvement in different cellular and signaling pathways. Structural and functional characterization of HPs reveal crucial roles in microorganisms, especially in pathogens related to human diseases. Here, we discussed all possibilities of in-silico analysis tools and other recently reported methods for hypothetical protein characterization and biomedical applications, including drug and vaccine development. Different methodologies, including meta-proteomics have been used to study protein expression by identification of HPs and comparative genomics have also come under observation due to the emergence of evolutionary study among different organisms. Structural characterization of proteins acts as a base for their functional prediction, novel drug target identification for disease treatment, vaccine production and sero-diagnosis. HPs have played major roles in different vital phenomenon for life including host adaptation, wound healing and chemotaxis. In the current era of drug and antibiotic resistance, HPs can be novel targets to treat related diseases. Identification and characterization of most HPs are under observation and will be the most promising genomic and bioinformatics techniques in structure-based drug designing and vaccine production in future.

Full Text:



Mertens HD, Svergun DI. Combining NMR and small angle X-ray scattering for the study of biomolecular structure and dynamics. Archives of biochemistry and biophysics, (2017); 628: 33-41.

Jacobs T, Williams B, Williams T, Xu X, Eletsky A, et al. Design of structurally distinct proteins using strategies inspired by evolution. Science, (2016); 352(6286): 687-690.

Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proceedings of the National Academy of Sciences, (1999); 96(8): 4285-4288.

Enany S. Structural and functional analysis of hypothetical and conserved proteins of Clostridium tetani. Journal of infection and public health, (2014); 7(4): 296-307.

Elias DA, Monroe ME, Marshall MJ, Romine MF, Belieav AS, et al. Global detection and characterization of hypothetical proteins in Shewanella oneidensis MR‐1 using LC‐MS based proteomics. Proteomics, (2005); 5(12): 3120-3130.

Thimm O, Bläsing O, Gibon Y, Nagel A, Meyer S, et al. mapman: a user‐driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. The Plant Journal, (2004); 37(6): 914-939.

Choi H-P, Juarez S, Ciordia S, Fernandez M, Bargiela R, et al. Biochemical characterization of hypothetical proteins from Helicobacter pylori. PLoS One, (2013); 8(6): e66605.

Galperin MY, Koonin EV. ‘Conserved hypothetical’proteins: prioritization of targets for experimental study. Nucleic acids research, (2004); 32(18): 5452-5463.

Shahbaaz M, Ahmad F, Hassan MI. Structure-based functional annotation of putative conserved proteins having lyase activity from Haemophilus influenzae. 3 Biotech, (2015); 5(3): 317.

Hava DL, Camilli A. Large‐scale identification of serotype 4 Streptococcus pneumoniae virulence factors. Molecular microbiology, (2002); 45(5): 1389-1406.

Hung M-C, Link W. Protein localization in disease and therapy. J Cell Sci, (2011); 124(20): 3381-3392.

Yu CS, Chen YC, Lu CH, Hwang JK. Prediction of protein subcellular localization. Proteins: Structure, Function, and Bioinformatics, (2006); 64(3): 643-651.

Brosch R, Gordon SV, Garnier T, Eiglmeier K, Frigui W, et al. Genome plasticity of BCG and impact on vaccine efficacy. Proceedings of the National Academy of Sciences, (2007); 104(13): 5596-5601.

Naveed M, Kazmi S, Anwar F, Arshad F, Dar T, et al. Computational Analysis and Polymorphism study of Tumor Suppressor Candidate Gene-3 for Non Syndromic Autosomal Recessive Mental Retardation. Journal of Applied Bioinformatics & Computational Biology, (2016); 5(2).

Park SJ, Son WS, Lee B-J. Structural analysis of hypothetical proteins from helicobacter pylori: an approach to estimate functions of unknown or hypothetical proteins. International journal of molecular sciences, (2012); 13(6): 7109-7137.

Smith RD, Anderson GA, Lipton MS, Masselon C, Paša-Tolić L, et al. The use of accurate mass tags for high-throughput microbial proteomics. Omics: a journal of integrative biology, (2002); 6(1): 61-90.

Zhang Y, Fonslow BR, Shan B, Baek M-C, Yates III JR. Protein analysis by shotgun/bottom-up proteomics. Chemical reviews, (2013); 113(4): 2343-2394.

Verberkmoes NC, Russell AL, Shah M, Godzik A, Rosenquist M, et al. Shotgun metaproteomics of the human distal gut microbiota. The ISME journal, (2009); 3(2): 179-189.

Ijaq J, Chandrasekharan M, Poddar R, Bethi N, Sundararajan VS. Annotation and curation of uncharacterized proteins-challenges. Frontiers in genetics, (2015); 6119.

Bixby C, Mahadevan P. Predicting the Function of Hypothetical Protein PANDA_003700 using Computational Analysis Methods; 2016. The Steering Committee of The World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp). pp. 130.

Teh BA, Choi SB, Musa N, Ling FL, Cun STW, et al. Structure to function prediction of hypothetical protein KPN_00953 (Ycbk) from Klebsiella pneumoniae MGH 78578 highlights possible role in cell wall metabolism. BMC structural biology, (2014); 14(1): 7.

Khan A, Ahmed H, Jahan N, Ali SR, Amin A, et al. An in silico Approach for Structural and Functional Annotation of Salmonella enterica serovar typhimurium Hypothetical Protein R_27. International Journal Bioautomation, (2016); 20(1).

Almeida MS, Herrmann T, Peti W, Wilson IA, Wüthrich K. NMR structure of the conserved hypothetical protein TM0487 from Thermotoga maritima: implications for 216 homologous DUF59 proteins. Protein science, (2005); 14(11): 2880-2886.

Madden T. The BLAST sequence analysis tool. (2013).

Shin DH, Yokota H, Kim R, Kim S-H. Crystal structure of conserved hypothetical protein Aq1575 from Aquifex aeolicus. Proceedings of the National Academy of Sciences, (2002); 99(12): 7980-7985.

Bashir N, Kounsar F, Mukhopadhyay S, Hasnain SE. Mycobacterium tuberculosis conserved hypothetical protein rRv2626c modulates macrophage effector functions. Immunology, (2010); 130(1): 34-45.

Bidkar A, Thakur N, Bolshette JD, Gogoi R. In-silico Structural and Functional analysis of Hypothetical proteins of Leptospira Interrogans. Biochem Pharmacol, (2014); 3(136): 2167-0501.1000136.

Kumar K, Prakash A, Tasleem M, Islam A, Ahmad F, et al. Functional annotation of putative hypothetical proteins from Candida dubliniensis. Gene, (2014); 543(1): 93-100.

Sivashankari S, Shanmughavel P. Functional annotation of hypothetical proteins–A review. Bioinformation, (2006); 1(8): 335.

Tirosh I, Barkai N. Computational verification of protein-protein interactions by orthologous co-expression. BMC bioinformatics, (2005); 6(1): 40.

van Noort V, Snel B, Huynen MA. Predicting gene function by conserved co-expression. TRENDS in Genetics, (2003); 19(5): 238-242.

Ingram JR, Knockenhauer KE, Markus BM, Mandelbaum J, Ramek A, et al. PNAS Plus Significance Statements. PNAS, (2017): 114(22); 5567-5570.

Thakare HS, Meshram DB, Jangam CM, Labhasetwar P, Roychoudhary K, et al. Comparative genomics for understanding the structure, function and sub-cellular localization of hypothetical proteins in Thermanerovibrio acidaminovorans DSM 6589 (tai). Computational biology and chemistry, (2016); 61226-228.

Conrad TA, Gong S, Yang Z, Matulich P, Keck J, et al. The chromosome-encoded hypothetical protein TC0668 is an upper genital tract pathogenicity factor of Chlamydia muridarum. Infection and immunity, (2016); 84(2): 467-479.

Kumar S, Nei M, Dudley J, Tamura K. MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Briefings in bioinformatics, (2008); 9(4): 299-306.

Ijaq J, Chandrasekharan M, Poddar R, Bethi N, Sundararajan VS. Annotation and curation of uncharacterized proteins-challenges. Frontiers in genetics, (2015); 6: 119.

Singh G, Sharma D, Singh V, Rani J, Marotta F, et al. In silico functional elucidation of uncharacterized proteins of Chlamydia abortus strain LLG. Future Science OA, (2017); 3(1): 66.

Satpathy R, Behera R, Guru RK. Homology modelling and molecular dynamics study of plant defensin DM-AMP1. Journal of Biochemical Technology, (2011); 3(4): 309-311.

Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein structures. Journal of applied crystallography, (1993); 26(2): 283-291.

Ting D, Wang G, Shapovalov M, Mitra R, Jordan MI, et al. Neighbor-dependent Ramachandran probability distributions of amino acids developed from a hierarchical Dirichlet process model. PLoS computational biology, (2010); 6(4): e1000763.

Irshad M, Munir H. Structural and Functional Characterization of a Hypothetical protein of Streptococcus Pyrogenes: An In-Silico Approach. Journal of Biochemistry, Biotechnology and Biomaterials, (2017); 1: 54-63.

Kamminga T, Koehorst JJ, Vermeij P, Slagman SJ, dos Santos VAM, et al. Persistence of Functional Protein Domains in Mycoplasma Species and their Role in Host Specificity and Synthetic Minimal Life. Frontiers in cellular and infection microbiology, (2017); 7: 00031.

Yam H, Abdul Rahim A, Mohamad S, Mahadi N, Abdul Manaf U. The Multiple Roles of Hypothetical Gene BPSS1356 in Burkholderia. (2014); 9(6): e99218.

Beseli A, Noar R, Daub ME. Characterization of Cercospora nicotianae Hypothetical Proteins in Cercosporin Resistance. PloS one, (2015); 10(10): e0140676.

Tong S-M, Chen Y, Ying S-H, Feng M-G. Three DUF1996 Proteins Localize in Vacuoles and Function in Fungal Responses to Multiple Stresses and Metal Ions. Scientific reports, (2016); 6.

Pandey G, Jain RK. Bacterial chemotaxis toward environmental pollutants: role in bioremediation. Applied and Environmental Microbiology, (2002); 68(12): 5789-5795.

Ward SG. Do phosphoinositide 3-kinases direct lymphocyte navigation? Trends in immunology, (2004); 25(2): 67-74.

Zhang K, Liu J, Charon NW, Li C. Hypothetical protein BB0569 is essential for chemotaxis of the Lyme disease spirochete Borrelia burgdorferi. Journal of bacteriology, (2016); 198(4): 664-672.

Woodley DT, Wysong A, DeClerck B, Chen M, Li W. Keratinocyte migration and a hypothetical new role for extracellular heat shock protein 90 alpha in orchestrating skin wound healing. Advances in wound care, (2015); 4(4): 203-212.

Mahmood MS, Ashraf NM, Bilal M, Ashraf F, Hussain A, et al. In Silico Structural and Functional Characterization of a Hypothetical Protein of Vaccinia Virus, (2016); 1: 54-63.

Dhawan R, Kumar M, Mohanty AK, Dey G, Advani J, et al. Mosquito-Borne Diseases and Omics: Salivary Gland Proteome of the Female Aedes aegypti Mosquito. OMICS: A Journal of Integrative Biology, (2017); 21(1): 45-54.

Naqvi AAT, Rahman S, Zeya F, Kumar K, Choudhary H, et al. Genome analysis of Chlamydia trachomatis for functional characterization of hypothetical proteins to discover novel drug targets. International journal of biological macromolecules, (2017); 96234-240.

Zarembinski TI, Hung L-W, Mueller-Dieckmann H-J, Kim K-K, Yokota H, et al. Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. Proceedings of the National Academy of Sciences, (1998); 95(26): 15189-15193.

Santos R, Ursu O, Gaulton A, Bento AP, Donadi RS, et al. A comprehensive map of molecular drug targets. Nature Reviews Drug Discovery, (2017); 16(1): 19-34.

Anderson AC. The process of structure-based drug design. Chemistry & biology, (2003); 10(9): 787-797.

Mountain V. Astex, Structural Genomix, and Syrrx. I can see clearly now: structural biology and drug discovery. Chemistry &

biology, (2003); 10(2): 95-98.

Brown ED, Wright GD. Antibacterial drug discovery in the resistance era. Nature, (2016); 529(7586): 336-343.

Yoneyama H, Katsumata R. Antibiotic resistance in bacteria and its future for novel antibiotic development. Bioscience, biotechnology, and biochemistry, (2006); 70(5): 1060-1075.

Chávez-Fumagalli MA, Schneider MS, Lage DP, Machado-de-Ávila RA, Coelho EA. An in silico functional annotation and screening of potential drug targets derived from Leishmania spp. hypothetical proteins identified by immunoproteomics. Experimental Parasitology, (2017); 17666-74.

Cameron TC, Cooke I, Faou P, Toet H, Piedrafita D, et al. A novel ex vivo immunoproteomic approach characterising Fasciola hepatica tegumental antigens identified using immune antibody from resistant sheep. International Journal for Parasitology, (2017).

Wang H-C, Ho C-H, Hsu K-C, Yang J-M, Wang AH-J. DNA mimic proteins: functions, structures, and bioinformatic analysis. Biochemistry, (2014); 53(18): 2865-2874.

Tucker AT, Bobay BG, Banse AV, Olson AL, Soderblom EJ, et al. A DNA mimic: The structure and mechanism of action for the anti-repressor protein AbbA. Journal of molecular biology, (2014); 426(9): 1911-1924.

Lima MP, Costa LE, Duarte MC, Menezes-Souza D, Salles BCS, et al. Evaluation of a hypothetical protein for serodiagnosis and as a potential marker for post-treatment serological evaluation of tegumentary leishmaniasis patients. Parasitology research, (2017); 116(4): 1197-1206.

Sharma D, Bisht DM. Tuberculosis Hypothetical Proteins and Proteins of Unknown Function: Hope for Exploring Novel Resistance Mechanisms as well as Future Target of Drug Resistance. Frontiers in microbiology, (2017); 8: 465.

Gazi MA, Kibria MG, Mahfuz M, Islam MR, Ghosh P, et al. Functional, structural and epitopic prediction of hypothetical proteins of Mycobacterium tuberculosis H37Rv: An in silico approach for prioritizing the targets. Gene, (2016); 591(2): 442-455.

Duarte MC, Lage DP, Martins VT, Costa LE, Carvalho AMRS, et al. A vaccine composed of a hypothetical protein and the eukaryotic initiation factor 5a from Leishmania braziliensis cross-protection against Leishmania amazonensis infection. Immunobiology, (2017); 222(2): 251-260.

Engidawork E, Gulesserian T, Fountoulakis M, Lubec G. Expression of hypothetical proteins in human fetal brain: increased expression of hypothetical protein 28.5 kDa in Down syndrome, a clue for its tentative role. Molecular genetics and metabolism, (2003); 78(4): 295-301.

Alves CF, Alves CF, Figueiredo MM, Souza CC, Machado-Coelho GLL, et al. American tegumentary leishmaniasis: effectiveness of an immunohistochemical protocol for the detection of Leishmania in skin. PLoS One, (2013); 8(5): e63343.


  • There are currently no refbacks.