Why it's so tricky to trace the origin of COVID-19
A 90-day investigation into the source of SARS-CoV-2 has shown consensus that the virus was not engineered. But many other elements remain a mystery.

After 20 months, 219 million cases, and more than four million deaths, we’ve learned a lot about the COVID-19 pandemic. But the most polarising question and central mystery remains: We still don’t know where the virus that started it all actually came from. Most experts were not surprised in late August when a 90-day investigation came up empty-handed on the origin of the SARS-CoV-2 virus. A brief, one-page unclassified summary released on August 27 revealed the only point on which the intelligence community agreed: that the virus was “not developed as a biological weapon.”
Understanding where, when, and how this pandemic started is important information for public health officials seeking to control its spread and even prevent future outbreaks. If the source of the virus is found to be bats or another animal, as many experts suspect, preventative measures might include curtailing contact between that animal and those living or working in close proximity. Measures could involve regular surveillance of animals and humans living where the virus is endemic to reduce the likelihood of future spillover—when a virus is transmitted to a human, directly or via a host animal, triggering an outbreak.
The results may also lead to broader policy decisions to curb deforestation and habitat fragmentation, and to block human settlements in known viral hot zones. Knowing where the pandemic virus arose could also lead to changes in human behaviour, such as reducing demand for bushmeat and wildlife-derived products that drive the illegal wildlife trade. And if the virus is instead found to have leaked from a lab, that finding would no doubt spur scientists and policy-makers to find safer ways to study these pathogens.
That’s why scientists support a thorough, evidence-based investigation for the origins of COVID-19. But similar inquires during past epidemics have taken months to years to yield answers, and in several cases, the mystery remains unresolved.
“Science takes time,” says Arinjay Banerjee, a virologist at the University of Saskatchewan in Canada. “To go back and confidently identify the source is a difficult task.”
Earlier this year, an international World Health Organisation team visited the city of Wuhan, China, to assess the evidence China had provided about the origin of SARS-CoV-2. In a report that summarised their findings, the WHO suggested that it was “likely to very likely” that the virus first spread from infected bats to humans via an intermediate host animal.
This was the case with the 2002 SARS-CoV outbreak—the first pandemic of the 21st century; the virus most likely spilled over from cave-dwelling horseshoe bats in China to palm civets sold in live animal markets, where it reached humans. Similarly, the 2012 MERS-CoV epidemic is suspected to have originated in bats and was later transmitted to dromedary camels, which infected humans.
That WHO report also deemed a laboratory leak from the Wuhan Institute of Virology, known for its work with coronaviruses, as “extremely unlikely.” But the conclusion sparked backlash from scientists and governments around the world, who argued that it’s still too early to rule out a lab leak based on the evidence in hand. Other experts caution that political motivations could drive people to hasty conclusions.
“There is a progenitor virus out there somewhere, and we should look for it,” says David Morens, senior scientific adviser on epidemiology to Anthony Fauci, director of the National Institute of Allergy and Infectious Diseases in the U.S. “But at some point, it crosses over from doing due diligence to wasting time and being crazy. We may have seen that point already.”
Here’s what we know so far about the scientific investigation into the origin of the pandemic, and what still needs to be done to find clear answers.
What evidence do virus detectives seek?
Tracing the origin of a virus requires extensive fieldwork, thorough forensics, and a fair bit of luck. The laborious endeavour can take years until scientists have the evidence they need to point to a source.
For diseases originating from animals, that evidence is typically a genetic match between virus sequences obtained from an animal and those from some of the first confirmed patients. The match may not be 100 percent, because viruses gather mutations or new genes over time and as they jump hosts. But with enough investigation, scientists have found nearly perfect matches of around 99 percent or better for some viruses—including the ones responsible for two previous coronavirus outbreaks.
Cat-like tree-dwelling palm civets, considered a delicacy and sold in street markets, quickly became the focus during the 2002-04 outbreak of Severe Acute Respiratory Syndrome (SARS) that emerged in China’s Guangdong Province, which resulted in more than 8,000 cases and nearly 800 deaths in 29 countries. Some of the first SARS cases included several infected restaurant chefs handling a variety of animals. Blood tests of animal traders in the region showed higher prevalence of antibodies against the SARS-associated coronavirus compared to healthy controls, with the highest levels recorded among those who traded primarily in masked palm civets.
A 2003 paper also showed that the nasal swab of a masked palm civet obtained from a live animal market in Guangdong yielded a 99.8 percent match between the full genome sequence of the SARS-CoV-like virus isolated from the civet and virus from a human. This indicated that the SARS-CoV-like virus had recently infected civets at the market.
But it became evident that these furry mammals weren’t the original sources, as the virus was mostly absent among farmed masked palm civets prior to reaching the markets, and it was not widely circulating in its wild populations. Suspecting bats to be the natural reservoirs, given that they harbour other zoonotic viruses, researchers sampled blood, faecal, and throat swabs of bats in regions across China and in Hong Kong.
More than 10 years later, they identified horseshoe bats in a remote cave in southwestern China’s Yunnan Province sporting virus strains that contain all the genetic pieces recorded in viral genomes from human patients. It’s possible the strain that precipitated the 2002 epidemic was a product of recombination of different genetic strains found in these bats.
Scientists later used lessons from tracing the origins of the SARS virus to investigate the source of the 2012 Middle East Respiratory Syndrome (MERS) coronavirus outbreak, which infected more than 2,000 people in 37 countries and killed nearly 900.
The virus was first isolated from a 60-year-old businessman who died of severe pneumonia and multi-organ failure in June 2012 in a hospital in Jeddah, Saudi Arabia. Early efforts to trace the source focused on bats. In Saudi Arabia, throat swab, urine, faecal, and blood samples from wild bats, including those occurring in the area where the first patient lived and worked, showed indications of a MERS-like coronavirus in one Egyptian tomb bat faecal sample. But without a full genome sequence, the role of bats could not be evaluated.
Meanwhile, anecdotal reports suggested some patients had been exposed to dromedary camels or goats. A 2013 study found antibodies against MERS in blood samples collected from retired racing camels in Oman, which were missing in blood from European sheep, goats, and cattle. Similar blood surveys conducted in several countries within the Arabian Peninsula, Egypt, and Spain’s Canary Islands also showed the presence of antibodies in camel blood, indicating the hoofed mammals were once infected by the virus.
But the strongest evidence of dromedary camels’ involvement came from Qatar in October 2013, where a camel herd owner and his co-worker were diagnosed with MERS. Nasal swab tests indicated five of 14 camels on their farm were MERS-positive. Further, whole viral genome sequences obtained from humans and camels were 99.5 to 99.9 percent identical.
Scientists believe camels are the intermediate hosts and suspect bats to be the original reservoirs of MERS-CoV. That’s because some bat species, like the vesper bats in South Africa, harbour viruses that are related to the one that causes MERS. But there’s still an evolutionary gap between those bat viruses and the human or camel versions.
“We still haven’t found those viruses that are very, very close,” says virologist Chantal Reusken at the Dutch Institute for Public Health and the Environment in the Netherlands.
What we know so far about COVID-19’s origin story
One key difference with the SARS and MERS outbreaks is that scientists were able to identify the intermediate animal sources within months of their onset. For COVID-19, that link remains unknown.
In December 2019, some of the early COVID-19 cases in Wuhan were reported among vendors linked to the Huanan market, which was selling wild and farmed animals including badgers, racoon dogs, civets, hare, rats, snakes, and crocodiles.
Between January 1, when the market was closed, and March 2020, officials with the Chinese Centre for Disease Control and Prevention collected more than 900 swab samples of floors, walls, or surfaces of objects from the Huanan market, its drainage system, and the surrounding markets. They found that 73 samples were SARS-CoV-2 positive.
The Chinese CDC also collected more than 2,000 faecal and body swab samples from alive or frozen animals in Huanan and other markets in Wuhan, from animals raised by some Huanan market suppliers, and from several wild animals found in nearby provinces in southern China.
According to the WHO report, all those samples tested negative for SARS-CoV-2, and in some cases, for antibodies against the virus. But this sampling missed many live animals typically sold when the markets were open. Similar tests of thousands of livestock and poultry samples collected from across China in 2018, 2019, and 2020 as part of routine animal surveillance also tested negative for SARS-CoV-2.
Last year, scientists detected SARS-CoV-2-like virus strains in Sunda pangolin tissue samples that were seized in anti-smuggling operations in southern China in 2017 and 2018. Sought for their meat and scales used in traditional Chinese medicine, these pangolins are among the world’s most trafficked mammals. But with only an 85.5 to 92.4 percent match between the human SARS-CoV-2 genome sequence and those obtained from pangolins, scientists can’t mark them as the relevant hosts. Also, a team surveying Wuhan’s wet markets between May 2017 and November 2019 found no pangolins being sold there.
And as was the case with MERS, comparing genome sequences from early COVID-19 patients with SARS-like coronavirus sequences directly from bats hasn’t yet yielded a close enough match, either.
So far, the closet relative is a coronavirus labelled RaTG13. It was discovered in Chinese horseshoe bats near a cave in Yunnan shortly after six miners fell sick and three of them died due to an unknown respiratory illness in 2012. RaTG13 shares 96.2 percent of its genome with human SARS-CoV-2. A coronavirus dubbed RmYN02 and derived from Malayan horseshoe bat droppings collected in Yunnan Province in 2019 is 93.3 percent similar.
Scientists have also identified SARS-CoV-2-related viruses in bats outside China. This January a team isolated a coronavirus sequence showing a 92.6 percent match from two Shamel's horseshoe bats sampled in Cambodia in 2010. And in February a coronavirus named RacCS203 taken from acuminate horseshoe bats in Thailand’s Chachoengsao Province showed 91.5 percent similarity in its genetic code.
Matches above 90 percent may sound high, but in genomic terms it’s a wide evolutionary gap. After all, humans and bonobos are an 98.7 percent genetic match.
“The big problem is that bats are everywhere and there are so many species with a huge diversity of viruses, including coronaviruses,” says Bart Haagmans, a virologist at the Erasmus Medical Centre in the Netherlands. “It’s difficult to find the bats with the virus that started the outbreak.”
Why the lab-leak suspicion persists
Many scientists believe that SARS-CoV-2 originated in nature and is unlikely a product of laboratory engineering. In a March 2020 Nature Medicine study, for instance, Kristian Andersen, a virologist at the Scripps Research Institute in La Jolla, California, and his colleagues showed that some genetic features once considered unique to SARS-CoV-2—and thus potentially human-made—are found in nature. They found features like the furin cleavage site, which facilitates the virus’s entry into human cells, and the receptor binding domain that allows the virus to anchor itself to human cells, also present in related viruses isolated from Malayan pangolins and bats.
But despite its likely natural origin, the theory that SARS-CoV-2 could have escaped from a laboratory continues to pique the interest of some scientists, several politicians, and many in the larger public sphere.
Part of the suspicion comes from the fact that the pandemic emerged very close to the Wuhan Institute of Virology, where researchers have been surveying bats for coronaviruses and maintaining a database of samples and virus sequences. “People look at the coincidence,” Andersen says.
The institute’s location doesn’t surprise him, though. Wuhan is an extremely connected and populous city with several wet markets, and in the past, bat coronaviruses have been identified from the larger region. “There are labs close to where outbreaks can happen, and where these outbreaks happen is where you want to study them,” Andersen says.
Still, experts and observers argue it’s possible members of the Wuhan Institute of Virology staff were infected due to safety lapses while working with the SARS-CoV-2 virus or during fieldwork, and then they inadvertently spread the disease.
““There are labs close to where outbreaks can happen, and where these outbreaks happen is where you want to study them,””
In a letter published in Science on May 14, some scientists suggested that the possibility of the virus escaping from the lab was not given due consideration during the WHO investigation. In a March 30 press briefing, WHO program manager Peter Ben Embarek, who led the COVID-19 fact-finding mission to China said: “Since [the lab leak theory] was not the key or main focus of the joint studies, it did not receive the same depth of attention and work as the other hypotheses.”
The Wuhan institute’s leading bat virologist Shi Zhengli said her laboratory records didn’t indicate any match between virus samples her team had collected from China’s bat caves and SARS-CoV-2 sequences. However, the WHO-China team couldn’t access those records.
Laboratory accidents aren’t unheard of. In Singapore, Taiwan, and China, four researchers in labs studying the SARS virus were accidently infected in the aftermath of the initial outbreak. In 2014, dozens of workers at the U.S. Centres for Disease Control and Prevention in Atlanta were potentially exposed to live anthrax bacteria resulting from a breach in safety procedures.
“Historically looking back, we can have lab leaks,” says Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Centre and lead author of the Science letter. “To be confident about what happened, we need more investigation.”
Fresh controversy erupted when a May 23 Wall Street Journal story reported that an undisclosed U.S. intelligence report claimed three researchers at the Wuhan institute sought hospital care in November 2019 for “symptoms consistent with both COVID-19 and common seasonal illness.” The identity of those researchers or the exact illness they had still remains unknown.
The WHO report found no records of COVID-19-related illness or evidence of infection among the institute’s staff prior to December 2019. However, the team didn’t have access to raw patient data from 174 early COVID-19 cases identified in Wuhan, half of which weren’t connected to the Huanan market. This information could aid the quest to trace the pandemic’s origin.
“What’s frustrating is that with some transparency, it can all be cleared up,” says virologist Darryl Falzarano at the University of Saskatchewan. “This is all weaponised politically, which is unfortunate.”
What happens next
Several molecular dating analyses have suggested that SARS-CoV-2 was potentially circulating as early as October 2019. The WHO report therefore recommends searching for SARS-CoV-2 antibodies in stored blood bank samples. This could help resolve the timeline for when the virus emerged, but the search for what started the pandemic may be a long and arduous one.
“You may have to spend the next 10 years sampling animals to find something that’s really close,” Falzarano says. “But you may not even find that perfect linkage.”
The WHO team recommends searching for SARS-CoV-2-related viral sequences and antibodies in horseshoe bats mainly in southern China and in East and Southeast Asia. Similar surveys for potential intermediate host species could include pangolins, minks, rabbits, raccoon dogs, and domesticated cats, all of which have been infected by SARS-CoV-2 in the recent past.
Other projects include tracing wildlife farms that supplied markets in Wuhan and testing susceptible animals and people interacting with them, and analysing the role of cold chains and frozen foods as a transmission source.
Recently, unpublished grant proposals and annual reports obtained by The Intercept gave insight into National Institutes of Health-funded coronavirus research in Wuhan in collaboration with New York-based non-profit EcoHealth Alliance.
In sophisticated, high-security facilities called Biosafety Level 3 labs, scientists tested the ability of new bat coronaviruses to infect humanised mice cells. These tests often used hybrid viruses created using a previously known SARS-like strain as the backbone and adding what’s called a spike protein from a new virus that facilitates its entry into cells.
“It’s standard virology research and it’s addressing a really key question: What are the potential viruses that could emerge [as a potential threat to humans] and where are they found," says Andersen, who reviewed the documents on National Geographic’s request. To him, the information doesn’t indicate that SARS-CoV-2 was engineered in the Wuhan laboratory as the backbone strain used in their experiments is not the backbone of SARS-CoV-2.
Still, even before the grant documents came to light, some pundits were wondering if the laboratory will be investigated further for conducting any risky experiments or biosafety breaches.
“We don’t know exactly what happened,” Bloom says. “So, we can’t rule out all possibilities”
