PAN-India SARS-CoV-2 genome sequencing reveals important insights into the outbreak

The ongoing pandemic of Severe Acute Respiratory Syndrome (SARS-CoV-2) has emerged as a global health problem and has adversely affected the world. The novel corona virus is spreading rapidly creating a threat for humankind. It is estimated that it can spread twice as fast as the 1918 Spanish flu virus. Whole-genome sequencing of pathogens, especially viruses, is a powerful tool to generate rapid information on outbreaks. The results from this technique help in effective understanding of the introduction of the infection ,dynamics of transmission, contact tracing networks and impact of informed outbreak control decisions. This technique has been effective in earlier outbreaks like the Ebola virus.

A collaborative study was conducted by researchers from NIBMG Kalyani, ILS Bhubaneswar, CDFD Hyderabad, NCBS Bengaluru, InStem Bengaluru, NCCS Pune and ICMR-Regional Centre for Medical Research, Bhubaneswar in order to achieve initial goal of completing the sequencing of 1000 SARS-CoV-2 genomes. The nasal and oral swabs were collected from individuals testing positive for COVID-19. The samples were collected from 10 states covering different zones within India. Phylodynamic analysis, mutation analysis and haplotype network analysis was performed. One thousand and fifty two sequences were used for phylodynamic, temporal and geographic mutation patterns and haplotype network analyses. This study will contribute in understanding how the virus is spreading, ultimately helping to restrict transmission, prevent new cases of infection, and provide information for research on how to interevent the spread of infection.

Preliminary results indicated that multiple lineages of SARS-CoV-2 are circulating in India, might have introduced by travel from Europe, USA and East Asia. In particular, there is a predominance of the D614G mutation, which is found to be emerging in almost all regions of the country. Scientists were able to estimate the possible source of country of different varieties of the virus introduced in India because of travel. The virus has also mutated and one of the mutations has attained highest frequencies across most of the states. There are two lineages of the virus named as 20A and 20B which are predominant across the country. The haplotype 20A is most abundant in northern and eastern India, 20B haplotype was abundant in southern and western India. The ancestral haplotypes of 19A and 19B were mostly found in Northern and Eastern India, with 19B being the most abundant in the latter region.

Analysis indicated that the haplotype diversities across India and in each region continued to increase until May 2020, after that it reduced drastically with the emergence of the A2a haplotypes which has overtaken other lineages by June 2020. Such interpretations might enable improved understanding of the virus and hence the health decisions. From the haplotype network, researchers observed that Maharashtra, Karnataka created three distinct haplotype nodes and sequences from Odisha, West Bengal and Uttarakhand sparse in different haplotype nodes. They also observed a haplotype node with the majority of the genomes from West Bengal, Odisha and a small percentage of the samples belonging to Uttarakhand.

Analysis of probable country of origin of these SARS-CoV-2 sequences in India revealed that they had been probably introduced by travel from multiple countries across the globe. 20A, B and C haplotypes were introduced from multiple countries in Europe and also American continents. Interestingly, 20A alone is predicted to have been introduced by travel from Italy, Saudi Arabia, United Kingdom and Switzerland. Similarly, 20B was introduced from the United Kingdom, Brazil, Italy and Greece. In contrast, 19A was introduced from China alone while 19B was introduced by travel from China, Oman and Saudi Arabia.

The number of COVID-19 occurrences in India has increased drastically over the time. Although most of the states have their own strategic lockdown devised to control the outbreak, it will be more efficient if we can include the geographical transmission pattern information in the planning of such strategies. In the current study, scientists have tried to explore the transmission of the infection among different states of India. It is necessary to add more genomic datasets to understand clear picture.


Analysis of SARS-CoV-2 genomes from western India reveals unique linked mutations

Transmission electron micrograph of SARS-CoV-2 (Wikipedia)

COVID-19 is caused by the strain of corona virus named Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2), belonging to the category of betacoronaviruses. The virus mainly causes respiratory illness, varying in severity for different individuals. The COVID-19 pandemic is affecting the whole world. India is one among the worst hit nation by the COVID-19 pandemic. The western part of India is badly affected by the COVID-19 pandemic, the Maharashtra state is a major hotspot for this disease, having around 1/5th of total reported infections in India.

A collaborative research conducted by researchers from NCCS Pune, B. J. Government Medical College, Pune and Armed Forces Medical College Pune present the first comprehensive study on genome and mutation pattern analysis of SARS-CoV-2 from the western part of India. In this study, researchers have investigated the molecular, phylogenomic, and evolutionary dynamics of SARS-CoV-2 in three different regions of Maharashtra, the western state in India. Total 90 genomes were sequenced. The analysis revealed three unique linked mutations which are common in most of the sequences studied. These may act as molecular markers to track the spread of the SARS-CoV-2 virus to different areas.

Nasopharyngeal/throat swabs of suspected COVID-19 patients were collected, samples confirmed with SARS-CoV-2 infection were used for the study. The age of the patients selected in the present study ranged from 2-78, with 80% patients were in the age range of 30-60 years. COVID-19 patient samples with a particular range Ct value for E gene were selected for the genome sequencing. Fast qc tool and BWA (Burrows-Wheeler Aligner) were used for data analysis. Neighbor joining method was used for phylogenomic analysis. Structural and bioinformatics analysis of SARS-CoV-2 variants was performed and comparative study among the Indian samples was also done. The observed mutation pattern was further analyzed to check any relationship with gender, age, and symptoms.

Phylognetic analysis of the genomes revealed that mutations C313T, C5700A, G28881A are unique patterns and observed in 45% of samples, indicating a newly emerging pattern of linked mutations. The Satara district viral strains showed mutations primarily at the 3´ end of the genome, while Nashik district viral strains displayed mutations at the 5´ end of the genome. Characterization of Pune strains showed that a novel variant has overtaken the other strains. Examination of the frequency of three mutations i.e., C313T, C5700A, G28881A in symptomatic versus asymptomatic patients was performed. The analysis showed mutations were prevalent in symptomatic cases, and were more prominent in females. These three mutations were present in more than 30% studied samples of age group 10-25. Interestingly, these mutations were not detected in the higher age group of 61-80.

Study of region-wise mutation pattern among the viral sequences indicated that, a specific pattern of mutation was prevalent in all districts. The relationship of mutation pattern with age, gender and symptoms was studied. A distinct pattern was observed in age-wise distribution, some of the mutations were prevalent in the age group of 10-25. The proportion of three mutations C313T, C5700A, G28881A were found relatively higher (~80%) in symptomatic patient samples as compared to asymptomatic (40-50%). Also, the mutation C241T was found in 90% of all the sequences and is located in the 5′ UTR region and found predominantly in severely affected patients. However, the role of this mutation has not been studied yet.

The comparative study indicated that, distinct sub-clones of virus were prevalent in different parts of India at the same time period. The type 19A clade virus was predominant in Delhi (Northern part) whereas in Maharashtra (western part) 20A, 20B clade virus was dominant in April-May 2020. While in Telangana (southern part), 19A clade was dominant in April, and it shifted completely to 20A and 20B in May 2020. Because of lockdown, factors contributing to transmission of SARS-CoV-2 virus was restricted. The researchers are assertive about prevalence of a specific viral variant in a region could be attributed to human host susceptibility for specific viral variants. This susceptibility seems to be based on mutations prevalent in the viral variants in that region.