PAN-India SARS-CoV-2 genome sequencing reveals important insights into the outbreak

The ongoing pandemic of Severe Acute Respiratory Syndrome (SARS-CoV-2) has emerged as a global health problem and has adversely affected the world. The novel corona virus is spreading rapidly creating a threat for humankind. It is estimated that it can spread twice as fast as the 1918 Spanish flu virus. Whole-genome sequencing of pathogens, especially viruses, is a powerful tool to generate rapid information on outbreaks. The results from this technique help in effective understanding of the introduction of the infection ,dynamics of transmission, contact tracing networks and impact of informed outbreak control decisions. This technique has been effective in earlier outbreaks like the Ebola virus.

A collaborative study was conducted by researchers from NIBMG Kalyani, ILS Bhubaneswar, CDFD Hyderabad, NCBS Bengaluru, InStem Bengaluru, NCCS Pune and ICMR-Regional Centre for Medical Research, Bhubaneswar in order to achieve initial goal of completing the sequencing of 1000 SARS-CoV-2 genomes. The nasal and oral swabs were collected from individuals testing positive for COVID-19. The samples were collected from 10 states covering different zones within India. Phylodynamic analysis, mutation analysis and haplotype network analysis was performed. One thousand and fifty two sequences were used for phylodynamic, temporal and geographic mutation patterns and haplotype network analyses. This study will contribute in understanding how the virus is spreading, ultimately helping to restrict transmission, prevent new cases of infection, and provide information for research on how to interevent the spread of infection.

Preliminary results indicated that multiple lineages of SARS-CoV-2 are circulating in India, might have introduced by travel from Europe, USA and East Asia. In particular, there is a predominance of the D614G mutation, which is found to be emerging in almost all regions of the country. Scientists were able to estimate the possible source of country of different varieties of the virus introduced in India because of travel. The virus has also mutated and one of the mutations has attained highest frequencies across most of the states. There are two lineages of the virus named as 20A and 20B which are predominant across the country. The haplotype 20A is most abundant in northern and eastern India, 20B haplotype was abundant in southern and western India. The ancestral haplotypes of 19A and 19B were mostly found in Northern and Eastern India, with 19B being the most abundant in the latter region.

Analysis indicated that the haplotype diversities across India and in each region continued to increase until May 2020, after that it reduced drastically with the emergence of the A2a haplotypes which has overtaken other lineages by June 2020. Such interpretations might enable improved understanding of the virus and hence the health decisions. From the haplotype network, researchers observed that Maharashtra, Karnataka created three distinct haplotype nodes and sequences from Odisha, West Bengal and Uttarakhand sparse in different haplotype nodes. They also observed a haplotype node with the majority of the genomes from West Bengal, Odisha and a small percentage of the samples belonging to Uttarakhand.

Analysis of probable country of origin of these SARS-CoV-2 sequences in India revealed that they had been probably introduced by travel from multiple countries across the globe. 20A, B and C haplotypes were introduced from multiple countries in Europe and also American continents. Interestingly, 20A alone is predicted to have been introduced by travel from Italy, Saudi Arabia, United Kingdom and Switzerland. Similarly, 20B was introduced from the United Kingdom, Brazil, Italy and Greece. In contrast, 19A was introduced from China alone while 19B was introduced by travel from China, Oman and Saudi Arabia.

The number of COVID-19 occurrences in India has increased drastically over the time. Although most of the states have their own strategic lockdown devised to control the outbreak, it will be more efficient if we can include the geographical transmission pattern information in the planning of such strategies. In the current study, scientists have tried to explore the transmission of the infection among different states of India. It is necessary to add more genomic datasets to understand clear picture.