Loading...

Comparative Analysis of Haplotype Assembly Algorithms to Identify and Propose Optimal Methods

Bagher, Melina | 2022

126 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 56057 (05)
  4. University: Sharif University of Technology
  5. Department: Electrical Engineering
  6. Advisor(s): Jahed, Mehran; Hossein Khalaj, Babak
  7. Abstract:
  8. Humans, as a diploid species, have two nucleotide sequences of homologous chromosomes in their genomes, where one set is inherited from the mother, and the other comes from the father. The Single Individual Haplotype assembly problem (SIH) refers to the reconstruction of these two distinct nucleotide sequences of a chromosome from the sequencing reads, and it is currently considered one of the most important issues in the field of computational genomics, which plays an essential role in solving various genetic and medical problems.Nowadays direct experimental methods are not welcomed due to their high cost, and labor intensity, and are limited to certain regions of the genome, therefore, computational methods are proposed to solve the problem of haplotype assembly from sequencing data. The largeness of high-throughput sequence data, as well as the lengthy sequence of the human genome, emphasized the importance of the speed of algorithms. Therefore, introducing a method based on the “optimal trade-off” between speed and accuracy will be a step forward in this matter.In this project, a heuristic algorithm has been proposed which contains two phases. In the first stage, using a new criterion and maximizing it in several repetitions, initial clusters of the sequencing reads were determined, and when necessary, they were refined in a follow-up stage to achieve higher accuracy, where the final haplotypes of two fragments subsets were obtained. This algorithm was implemented on the simulated data with various characteristics such as the read length, coverage, and error rate and was compared with several state-of-the-art algorithms based on the fragment partitioning method. The result demonstrates that the proposed method outperforms previous methods in speed in various types of data, particularly in dealing with high-coverage sequencing data, while having the ability to reconstruct the haplotypes with acceptable accuracy in comparison to previous methods
  9. Keywords:
  10. Haplotype ; Genome Data ; Genome Sequencing ; Chromosomes ; Single Nucleotide Polymorphism (SNP) ; Haplotype Assembly ; Genotype

 Digital Object List

 Bookmark

...see more