In-depth: What ails India’s coronavirus genome sequencing system

The country sequenced only 0.2% samples till date, one of the lowest in the world

Published on:

02 Sep 2021, 3:17 am

Sequencing and analysis of the novel coronavirus, a crucial step in infection containment strategy, declined sharply in India, even as cases of the resultant disease continued to rise.

The number of coronavirus samples sequenced and analysed in the country plummeted 76 per cent to 1,321 in July from 5,542 in June, showed government data (accessed August 31, 2021).

Down To Earth did not take into account the trend for August because the process for samples collected in the last few days of the month might still not be complete.

Volume skewed for states

India formed the Indian SARS-CoV-2 Consortium on Genomics (INSACOG), a consortium of 10 genome sequencing labs, in December 2020 to pump up efforts for genome sequencing. The number of labs involved in coronavirus sequencing since then has gone up to 28.

Despite the infrastructure boost, not only has the quantum of exercise declined, the proportion of samples sequenced by individual states has also become skewed.

To date, Maharashtra (14,003 samples) and Kerala (5,485 samples) have sequenced and analysed 23 per cent and 9 per cent of the country’s total volume analysed, according to the Union Department of Biotechnology’s website on genome sequencing accessed on August 31.

These are the only states with genome sequencing facilities of their own. The rest of the country has to send samples to INSACOG labs run by the Centre.

The shares of other big states are much lower: Bihar has analysed 336 samples; Uttar Pradesh 1,049; Madhya Pradesh 1,545; Gujarat 1,956 and Karnataka 2,541 since beginning of the pandemic.

Delhi has sequenced and analysed 5,764 samples so far.

On the other hand, the northeastern states, where the infections were soaring in July, have not sequenced and analysed any significant number. This goes against the ideal practice of increasing sequencing in areas where cases surge, to understand if a new variant is the driving factor.

Other than Manipur, every northeastern state has sequenced and analysed less than 300 samples each since December.

Samples from these states are sequenced and analysed at the National Institute of Biomedical Genomics in Kolkata, West Bengal. Calls and texts to its director, Saumitra Das, did not elicit a response.

Accessed August 31, 2021

What new guidelines hide

The main purpose of sequencing is surveillance — to get the true picture of prevailing variants, emerging variants (some of which, like delta, may lead to fresh surges) and those causing reinfection and vaccine breakthrough, impacting vaccine efficacy.

The central government, while announcing INSACOG last December, said it aimed to analyse 5 per cent of all positive samples from each state. This was a decent aim, according to many experts, considering the size of the country. This method is known as ‘randomised surveillance’.

Four months later, the government decided to change the policy: 300 samples would be sent from every state to genome sequencing labs every month. These samples would be collected from 10 sites (hospitals and COVID-19 testing labs) in the state. This is called ‘sentinel site surveillance’.

The government justified the move saying, as cases were rising exponentially at that time, a 5 per cent sequencing would put a lot of load on the sequencing labs.

But in July, the government made a different argument. Jitendra Singh, Union Minister of State for science and technology, said in Lok Sabha July 23:

Quantum of SARS- CoV-2 genome sequencing depends on the positivity rate, which has currently decreased.

This implied that fewer cases could mean going easier on the sequencing exercise.

The revision was according to the World Health Organisation (WHO) standards for genome sequencing, the government had added.

What the government did not say was: WHO never discouraged random surveillance and even said it is the more sensitive method. Sentinel surveillance may be adopted by countries that have ‘minimal lab capacity’, WHO said in its interim guidance document.

Sentinel surveillance might not be the best representation, WHO added, contrary to what the government claimed.

Falling behind

The United Kingdom and the United States have sequenced and shared data for 866,000 and 755,000 samples already, according to GISAID, a global, open-access online repository of genomic data.

In other words, these countries have sequenced 11.2 per cent and 2.3 per cent of their total cases.

India has sequenced 82,000 samples, a meagre 0.2 per cent of its cumulative cases. More than 100 countries have sequenced a greater share of samples than India.

"It is true that it is difficult for a country as big as India to sequence a greater number of samples due to a high caseload. But some sort of randomisation is necessary rather than this fixed number of 300,” said Rakesh Mishra, former director of Centre for Cellular & Molecular Biology, Hyderabad, an INSACOG lab.

Shahid Jameel, eminent virologist and former chairman of the scientific advisory body of INSACOG, said:

India’s strategy may not be enough to pick up new variants if they are circulating in low frequency. Only if their frequency has gone up after being in circulation for some time, then they will be picked up.

His view is in concurrence with the stance of the European Centre for Disease Prevention and Control on genome surveillance.

The purpose that this strategy can serve may be limited to knowing what is already spreading, Jameel added.

Many labs, moreover, do not get an adequate number of quality samples either in time or with scientific parameters intact, a scientist working at an INSACOG lab said on the condition of anonymity. “Good-quality samples would have landed in time if sequencing was a priority for the Centre and states,” he added.

The government had also said it may do a 100 per cent sequencing if the country was faced with vaccine breakthrough infections and reinfections.

Delta variant infections that bypass vaccine immunity are frequent in the country, according to the latest INSACOG weekly bulletin (August 30, 2021). The update, however, did not mention the number of breakthrough infection samples analysed.

As many as 82,361 samples were sequenced through August 30, of which 51, 651 were analysed, according to the bulletin. Here’s the normal course of events in the process — from collection to sequencing and analysis — to understand the difference between the two:

Swab collected from the recipient is tested by RT-PCR (reverse transcription polymerase chain reaction). If positive, and with a CT value of less than 25, the sample can be sent to a sequencing lab.

The RNA of the virus sample is converted to DNA. It is cut into short fragments. Several copies of each fragment are made.

These fragments are loaded into a sequencer machine. The machine 'reads' the order of nucleotides of each of those several segments. The nucleotides are designated English alphabets A, T, C, G in a certain order.

These ‘reads’ of several individual small segments are then fed into computer software. The software puts all those ‘reads’ together to form one complete genome or a whole genome sequence of nucleotides. This completes sequencing.

Now the genome, which has been sequenced, is assigned a lineage through other web-based programmes. These lineages help one determine what variant it is, whether it matches with the lineage of nationally or internationally existing variants of the virus or a new variant has come up. If it is new, subsequent studies decide whether it is merely a ‘variant of interest’ (VoI) or a much more significant ‘variant of concern’ (VoC).

Currently there are four VoCs — Alpha (emerged in the UK), Beta (emerged in Brazil), Gama (emerged in South Africa) and Delta (emerged in India).

Source: Centers for Disease Control and Prevention, US

The WHO has been very particular about the fact that data of any sequencing done anywhere in the world should be submitted to open-access platforms like GISAID, so that a sequence done in one part of the world can be looked at by the global scientific community.

So the thrust is on uploading the whole sequence. The metadata also has to be accurately presented — date and location of sample collection, age, sex, ethnicity, travel history, symptoms onset, kind of symptoms clinical outcomes of the patients, and so on.

Missing sequences

India has so far sequenced and shared only a little over 44,000 samples, according to GISAID. INSACOG, in its latest bulletin, claimed the country has sequenced more than 80,000. Where are the rest?

DTE reached out to DBT Secretary Renu Swaroopa and National Center for Disease Control Director Sujeet Kumar Singh, but recieved no response.

The Indian sequencing setup is taking longer to upload sequences, even though sequencing is being done, according to independent experts.

Mishra, who is currently the head of Tata Institute of Genetics and Society, Bengaluru, said:

If sequences are not shared in the public domain timely, it may defeat the purpose of sequencing because the situation is always fluid in terms of mutations. Maybe by the time the sequence is uploaded, it would have undergone another few rounds of changes.

The genetic information of the spike protein, which the virus uses to enter the host body, is the most important portion of the virus.

A scientist outside the government setup will not be able to see the nature or location of mutations in the genome sequence of the virus unless raw data is posted, said Gaurav Sharma, a scientist at the Institute of Bioinformatics and Applied Biotechnology, Bengaluru.

Sharma and his colleagues wrote a paper on the lag of SARS-CoV-2 genome submissions to GISAID published in Nature Biotechnology August 10, 2021.

At present, the Indian platform carries only the final result, that is, the analysis of how many lineages have been found and where. There is no way for independent scientists to verify it.

Long turnaround time

The mismatch of the government and GISAID figures led to a discrepancy in turnaround time.

The average lag time between collection of samples and submission of a genome with metadata is about 57 days for India, Sharma and his colleagues noted in their paper. The analysis of the genetic information and variant identification is supposed to happen after the submission of sequences to the GISAID portal.

But the country’s ‘turnaround time’ is much shorter, according to the Union health minister, Mansukh Mandaviya. He told the Parliament July 27, 2021:

Presently, the turnaround time from sample collection to sequencing data generation and variant calling (identification) is two weeks. INSACOG has revised SOPs to reduce turnaround time to 7-10 days.

If the sequences are analysed in two weeks, why not submit the raw data to GISAID in time, asked Sharma.

India’s turnaround time is one of the highest, analysis of the average time lag for some other countries showed. The UK with 410,000 sequences (at time of writing of paper) had uploaded raw data within 16 days on an average, Germany with 1.1 million sequences in 18 days, Denmark with 98,000 sequences in 23 days, the US with 490,000 sequences in 24 days and Sweden with 54,000 sequences in 45 days.

All of these countries have sequenced more samples than India and shared with GISAID sooner.

Governance