Mathematical Understanding of Sequence Alignment and Phylogenetic Algorithms: A Comprehensive Review of Computation of Different Methods

Rashid Saif, Sadia Nadeem, Alishba Khaliq, Saeeda Zia, Ali Iftekhar


Pairwise sequence alignment is one of the ways to position two biological sequences to identify regions of similarity that may suggest the functional, structural and evolutionary relationship among proteins and nucleic acids. There are two strategies in pairwise alignment: local sequence alignment (Smith Waterman algorithm) and global sequence alignment (Needleman Wunsch algorithm). In the prior approach, two sequences that may or may not be related, are aligned to find regions of local similarities in large sequences, whereas in the later one, two sequences of same length are aligned to identify their conserved regions. Moreover, similarities and divergence between biological sequences also has to be rationalized and visualized in the form of phylogenetic trees, so the dendrogram construction approaches were developed and divided into distance-based and character-based methods. In this review article, different algorithms of sequence alignment and phylogenetic tree construction were meditated with examples and compared by looking into the background computation for the better understanding of the algorithms, which will be helpful for molecular biology, computational sciences and mathematics/statistics novices. Phylogenetic trees are constructed through various methods, some are computationally robust but does not provide precise evolutionary insight, whereas some provide accurate evolutionary understandings, but computationally exhaustive and cumbersome. So, there is a need to understand the implicit mathematics and intricate computation behind the dendrogram construction for improving the existing algorithms and developing new methods.  

Keywords: Local sequence alignment; Global sequence alignment; UPGMA; Neighbour joining; Fitch Margoliash; Maximum-Parsimony; Maximum-Likelihood   

Full Text:



Xiong J Essential bioinformatics. Chapter: Book Name. 2006 of publication; Cambridge University Press.

Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of molecular biology, (1970); 48(3): 443-453.

Smith TF, Waterman MS. Comparison of biosequences. Advances in applied mathematics, (1981); 2(4): 482-489.

Hanmandlu M, Sani A, Gaur D. Modified k-Tuple method for the construction of phylogenetic trees. Trends in Bioinformatics, (2015); 8(3): 75.

Huang X. On global sequence alignment. Bioinformatics, (1994); 10(3): 227-235.

Rognes T (2011) Determination of optimal local sequence alignment similarity score. Google Patents.

Hu Y-C, Tiwari S, Mishra KK, Trivedi MC, Munjal G, et al. Phylogenetics Algorithms and Applications. Ambient Communications and Computer SystemsRACCCS-2018, (2018); 904187-194.

Burr T. Phylogenetic trees in bioinformatics. Current Bioinformatics, (2010); 5(1): 40-52.

Bruno WJ, Socci ND, Halpern AL. Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Molecular biology and evolution, (2000); 17(1): 189-197.

Stefan Van Dongen T, Winnepenninckx B. Multiple UPGMA and neighbor-joining trees and the performance of some computer packages. Mol Biol Evol, (1996); 13(2): 309-313.

Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, (1987); 4(4): 406-425.

Moret BM, Warnow T (2002) Reconstructing optimal phylogenetic trees: A challenge in experimental algorithmics. Experimental Algorithmics: Springer. pp. 163-180.

Hillis DM. Approaches for assessing phylogenetic accuracy. Systematic Biology, (1995); 44(1): 3-16.

Saitou N, Imanishi T. Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. (1989).

Peng C. Distance based methods in phylogenetic tree construction. NEURAL PARALLEL AND SCIENTIFIC COMPUTATIIONS, (2007); 15(4): 547.

Desper R, Gascuel O. Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Molecular Biology and Evolution, (2004); 21(3): 587-598.

Alon N, Chor B, Pardi F, Rapoport A. Approximate maximum parsimony and ancestral maximum likelihood. IEEE/ACM Transactions on Computational Biology and Bioinformatics, (2008); 7(1): 183-187.

Durbin R, Eddy SR, Krogh A, Mitchison G Biological sequence analysis: probabilistic models of proteins and nucleic acids. Chapter: Book Name. 1998 of publication; Cambridge university press.

Hall BG Phylogenetic trees made easy: A how to manual. Chapter: Book Name. 2011 of publication; Sinauer.

Rizzo J, Rouchka EC. Review of phylogenetic tree construction. University of Louisville Bioinformatics Laboratory Technical Report Series, (2007); 2-7.

Sharma A, Jaloree S, Thakur RS. Review of Clustering Methods: Toward Phylogenetic Tree Constructions; 2018. Springer. pp. 475-480.

Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of molecular evolution, (1981); 17(6): 368-376.

Dorigo M, Di Caro G. Ant colony optimization: a new meta-heuristic; 1999. IEEE. pp. 1470-1477.


  • There are currently no refbacks.