Introduction to Galaxy Platform for NGS Variant Calling Pipeline

Rashid Saif, Aniqa Ejaz, Tania Mehmood, Fatima Asif, Suliman Mohammad Alghanem, Talha Saleem Ahmad

Abstract


Background: Galaxy web-based platform for Next Generation Sequence (NGS) data analysis provides unprecedented opportunities to characterize, analyze and computationally visualize genomic landscapes with limited-resources. An initiative was taken to explore this pipeline for NGS data-analysis by using Galaxy platform, for its relative accessibility, reproducibility, transparency and scalability.  

Methods: Variant calling and associated workflows were executed on NGS pooled-seq data of 12 Pakistani Teddy goats. Different tools used in this pipeline are FastQC for quality checks, Trimmomatic for trimming data, SAM/BAM tools for conversion of file formats, Picard tools for marking deduplicates, VCFtools/FreeBayes for genomic variant detection and SnpSift to annotate the variants.

Results: Highly associated functionally untrivial 43,712 loci were percolated having 87,510 alleles. Besides, 1,548 variants with 1,134 SNPs, 23 mixed variants, 76 MNP, 183 insertions and 132 deletions were observed in Teddy breed using San Clement ARS1 reference genome. Furthermore, 1,283 homozygous and 265 heterozygous variant were also divulged out of 43,447 loci. These variants are likely to be liable for general phenotypic traits of Teddy with smaller body-size, tender meat quality and agility along with other breed specific traits. 

Conclusion: Galaxy fulfills the core function of reproducibility and easy accessibility by removing the gaps between large data analysis and its interpretations. This variant calling pipeline reveals the genomic differences of Teddy specific characteristics as compare to ARS1 reference genome.

Keywords: Galaxy platform; NGS data; Teddy goat; Variant calling; Bioinformatics


Full Text:

PDF

References


Cock PJ, Grüning BA, Paszkiewicz K, Pritchard L. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ, (2013); 1e167.

Lohmann K, Klein C. Next generation sequencing and the future of genetic diagnosis. Neurotherapeutics, (2014); 11(4): 699-707.

Harris D. The distribution and ancestry of the domestic goat; 1962. Wiley Online Library. pp. 79-91.

Tahir M, Younas M, Raza S, Lateef M, Iqbal A, et al. A study on

estimation of heritability of birth weight and weaning weight of Teddy goats kept under Pakistani conditions. Asian-Australasian Journal of Animal Sciences, (1995); 8(6): 595-597.

Afzal M, Naqvi A. Livestock resources of Pakistan: present status and future trends. Quart Sci Vis, (2004); 9(1-2): 15-27.

Witkowski VM. Study of an endangered species enhancement program in coastal wetlands: public perceptions and management strategies. (1993).

Taylor J, Schenck I, Blankenberg D, Nekrutenko A. Using galaxy to perform large‐scale interactive data analyses. Current protocols in bioinformatics, (2007); 19(1): 10.15. 11-10.15. 25.

Brown J, Pirrung M, McCue LA. FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool. Bioinformatics, (2017); 33(19): 3137-3139.

Tange O. Gnu parallel-the command-line power tool. The USENIX Magazine, (2011); 36(1): 42-47.

Ruden DM, Cingolani P, Patel VM, Coon M, Nguyen T, et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Frontiers in genetics, (2012); 335.

Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tárraga A, et al. The European nucleotide archive. Nucleic acids research, (2010); 39(suppl_1): D28-D31.

Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, et al. Database resources of the national center for biotechnology information. Nucleic acids research, (2007); 36(suppl_1): D13-D21.

McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, et al. The ensembl variant effect predictor. Genome biology, (2016); 17(1): 122.

Cock PJ, Fields CJ, Goto N, Heuer ML, Rice PM. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic acids research, (2010); 38(6): 1767-1771.

Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics, (2011); 27(12): 1691-1692.

Hastreiter M, Jeske T, Hoser J, Kluge M, Ahomaa K, et al. KNIME4NGS: a comprehensive toolbox for next generation sequencing analysis. Bioinformatics, (2017); 33(10): 1565-1567.

Chiang C, Layer RM, Faust GG, Lindberg MR, Rose DB, et al. SpeedSeq: ultra-fast personal genome analysis and interpretation. Nature methods, (2015); 12(10): 966.

Silva VHd (2015) Identification of CNVs in the Nelore genome and its association with meat tenderness: Universidade de São Paulo.

Liu X, Du Y, Trakooljul N, Brand B, Muráni E, et al. Muscle transcriptional profile based on muscle Fiber, mitochondrial respiratory activity, and metabolic enzymes. (2015); 11(12): 1348.


Refbacks

  • There are currently no refbacks.