The advent of next generation sequencing technologies is providing new insight into HIV-1 diversity and evolution, which has created the need for bioinformatics tools that may be applied to the characterization of viral quasispecies. Within an infected individual, HIV-1 viral populations can show an enormous level of genetic diversity, which presents major hurdles for the sustained control of viral replication by sponsor immune reactions and antiretroviral treatments.1 Until recently, the molecular tools for the characterization of viral quasispecies were extremely arduous and costly.2 The advent of next generation sequencing (NGS) systems, with their expanded sampling depth and capacity for automation, 3 is providing fresh insight into viral diversity and evolution.4 The main experimental approaches of NGS have been whole genome sequencing,5 whole gene sequencing,6 and targeted deep sequencing (TDS).7C9 The latter examines a defined subgenomic region of interest at great sampling and sequencing depth to determine the frequency of the different variants. As the capacity to obtain longer reads offers improved over the past years, it is right now possible to accurately determine, rather than just infer, the linkage among measured polymorphisms. The quantity and the quality of the data generated in TDS experiments present major difficulties for traditional analysis tools. Unfortunately, most of the existing NGS bioinformatics tools10,11 have been developed for the analysis of haploid or diploid organisms, preventing their seamless software to HIV-1 populations. Here we present Nautilus, a bioinformatics package for the analysis of HIV-1 TDS data. The program consists of a graphical user interface (GUI) with two modules: DeepHaplo and Motifs. Using mainly because an input an alignment file in the SAM format,10 DeepHaplo computes the nucleotide foundation rate of recurrence and go through depth at each position, and presents the results in tabular and graphic types (Fig. 1aCf). To facilitate the visualization of the different facets of the data, results are displayed including or omitting alignment gaps, and in linear or logarithmic scales. A novel feature of DeepHaplo is the implementation of a hash algorithm (Supplementary Fig. S1; Supplementary Data are available on-line at www.liebertpub.com/aid) to efficiently compute the frequency of haplotypes (i.e., polymorphisms that are present in Paeoniflorin IC50 the same NGS go through). Positions of interest are either came into by the user or are recognized by the software based on a user-defined threshold for minor-allele rate of recurrence (MAF) (Fig. 1g). FIG. 1. Go through depth and frequencies of solitary nucleotide variants and haplotypes can Paeoniflorin IC50 be computed from the DeepHaplo module. (a) Histogram of the distribution of sequencing depth at each position. (b) Scatterplot of the sequencing depth at each position acknowledging … DeepHaplo uses the mapping orientation info offered in the bitwise FLAG value in the SAM file10 to compute the frequencies of nucleotide bases at each position and the haplotypes in each orientation. This feature, combined with the analysis Paeoniflorin IC50 of the Motifs module, allows the validation of polymorphisms and haplotypes when strand bias is definitely suspected. In Motifs, interrogated positions are recognized through a user-defined threshold for MAF, and the rate of recurrence of variants at each position is definitely computed for the ahead and reverse orientations. Motifs also calculates the number of forward and reverse reads supporting a given variant in the establishing BST2 of the sequence context surrounding the candidate variant, as this has been shown to strongly influence strand bias (e.g., homopolymers).12 Number 2a shows a real case of a polymorphic position where the variants are equally supported by reads in both orientations (compare the blue and red bars), whereas Fig. 2b demonstrates the A variant is definitely observed only in reads in the reverse orientation, likely reflecting a sequencing artifact. FIG. 2. The Motifs module provides information about the rate of recurrence of solitary nucleotide variants based on mapping orientation and the sequence context surrounding the putatively polymorphic position. (a) Profile of a true polymorphic site. The recognized variants … In summary, Nautilus represents a new suite of bioinformatics tools to support the analysis of TDS data in order to facilitate the application of NGS to the characterization of HIV-1 populations and development. Nautilus runs on Mac.

The advent of next generation sequencing technologies is providing new insight
Tagged on:     

Leave a Reply

Your email address will not be published. Required fields are marked *