Introduction to Galaxy Platform for NGS Variant Calling Pipeline

Rashid Saif, Aniqa Ejaz, Tania Mehmood, Fatima Asif, Suliman Mohammad Alghanem, Talha Saleem Ahmad


Background: Galaxy web-based platform for Next Generation Sequence (NGS) data analysis provides unprecedented opportunities to characterize, analyze and computationally visualize genomic landscapes with limited-resources. An initiative was taken to explore this pipeline for NGS data-analysis by using Galaxy platform, for its relative accessibility, reproducibility, transparency and scalability.  

Methods: Variant calling and associated workflows were executed on NGS pooled-seq data of 12 Pakistani Teddy goats. Different tools used in this pipeline are FastQC for quality checks, Trimmomatic for trimming data, SAM/BAM tools for conversion of file formats, Picard tools for marking deduplicates, VCFtools/FreeBayes for genomic variant detection and SnpSift to annotate the variants.

Results: Highly associated functionally untrivial 43,712 loci were percolated having 87,510 alleles. Besides, 1,548 variants with 1,134 SNPs, 23 mixed variants, 76 MNP, 183 insertions and 132 deletions were observed in Teddy breed using San Clement ARS1 reference genome. Furthermore, 1,283 homozygous and 265 heterozygous variant were also divulged out of 43,447 loci. These variants are likely to be liable for general phenotypic traits of Teddy with smaller body-size, tender meat quality and agility along with other breed specific traits. 

Conclusion: Galaxy fulfills the core function of reproducibility and easy accessibility by removing the gaps between large data analysis and its interpretations. This variant calling pipeline reveals the genomic differences of Teddy specific characteristics as compare to ARS1 reference genome.

Keywords: Galaxy platform; NGS data; Teddy goat; Variant calling; Bioinformatics

Full Text:



