On the trail of COVID-19: new virus, familiar family

As the world learned that the USA had assassinated Qasem Soleimani during the first week of 2020, doctors in China were coming to grips with a new viral infection that was spreading rapidly through the population of Wuhan. Wuhan is the capital of Hubei province, with a metro population above 19 million. Coincidentally, I had visited Hangzhou, China (800 km / 500 miles distant) just two months before!

This photograph comes from Felix Wong in the South China Morning Post / Getty.

The first hospitalization of a patient diagnosed with the COVID-19 disease took place on December 12th, 2019. The initial spread apparently took place in the Huanan Seafood Market, a “wet market” in the city. As the severity of the new virus became apparent, the market was closed on January 1, 2020. These markets generally feature both live animals and meat, and yet Huanan was quite close to the center of the city. The proximity of animals particularly matters in this case, because COVID-19 is a “zoonotic” disease. In other words, this virus didn’t suddenly appear out of nowhere; instead, it is believed to have jumped the species boundary to enter human populations from an animal population. It’s quite possible that the virus jumped from a wildlife population (likely bats, in this case) to a domesticated animal population to humans. This property is not particularly unusual; we frequently see strains of influenza jump from other species, and we believe that HIV derived from Simian Immunodeficiency Virus.

A molecular view of COVID-19

I am stunned to learn how much information has been produced in just two months for COVID-19. The World Health Organization settled on a name for the disease during the last week; it appears that the virus will be called SARS-CoV-2, replacing the temporary name 2019-nCoV. For information about the disease, I suggest a visit to websites for the Centers for Disease Control and Prevention or the World Health Organization. I was also impressed by what I found at the famed journal Lancet. For my part, I was glad to see the volume of molecular information already available at NCBI and particularly the Protein Data Bank. To have seen this virus for the first time only a month and a half ago, scientists have clearly been putting in overtime hours!

At left, T4 bacteriophage by Adenosine at Wikimedia Commons. At right, a SARS coronavirus invades, by David S. Goodsell of RCSB Protein Data Bank.

When people envision the word “virus,” they might think of an icosahedral bacteriophage like the left part of the image above; these complexes of protein and nucleic acid are indeed viruses. Frequently the viruses that infect humans have a rather more complex structure. Coronaviruses like the one that causes COVID-19 are large, enveloped, positive-sense single-stranded RNA genome viruses with a strong penchant for recombination. They have been studied extensively since their discovery in the late 1960s. I’ll try to explain that description here.

Let’s start with “large.” Each coronavirus is fairly bulky, with a size between 80-160 nm. Quite a few of the media stories about the virus show people wearing disposable breath masks. Even the best of those masks are designed to filter particulates of 0.6 microns, or 600 nm; the goal, then, is to block droplets containing the virus rather than individual “naked” virions. When doctors work with nasty respiratory diseases like multiply drug-resistant tuberculosis, they generally wear an N95 respirator. Those are intended to block 95% of particulates of 300 nm size. I was surprised to learn that N95 masks are not as effective for people who wear beards!

The term enveloped is used to describe a virus that is surrounded by a membrane. You can see electron micrograph images of the virus causing COVID-19 at Flickr (including the one at the top of this post). Coronaviruses still contain many proteins and copies of a genome, but they are surrounded by a lipid bilayer, just like human cells. When human cells are hijacked to manufacture many copies of the virus, the mature virions “bud” from the cell surface, surrounding themselves with a bubble of the cell’s own membrane. This membrane is also important in their invasion of new cells. The virus initiates a membrane fusion to merge its envelope with the target cell, releasing the virus contents into the new cell.

To say a virus employs a “positive-sense single-stranded RNA genome” will bewilder most people, so let’s visit each word. Human genomes are made of double-stranded DNA, the familiar “double helix.” Our 23 pairs of chromosomes amount to a total of roughly three billion nucleotides, with just over 20,000 protein-coding genes. Bacterial genomes are also double-stranded DNA, but they are far more compact. Our intestinal friend E. coli, for example, has a genome size of 4.7 million nucleotides (0.15% the size of human genome), coding for around 4,400 protein-coding genes (22% the size of the human proteome, discounting variable splicing etc.). The virus that causes COVID-19 has a genome around 29,900 nucleotides in length (0.64% the size of the E. coli genome). Counting its proteins is a little challenging since many viruses produce “polyproteins” that are translated in one long polypeptide and then cleaved to functional proteins via proteases. Instead of being stored as double-stranded DNA, though, each virus contains the genome in a single RNA molecule that is not base-paired (rather than a twisted ladder, the RNA strand can adopt a complicated shape on its own). We describe it as “positive-sense” because once the RNA genome has been released from the virus, it can immediately be translated by the cell’s ribosomes to produce protein!

Genome annotation for SARS-CoV-2 genome from NCBI

What does this molecular information tell us?

We have seen viruses like this before. As of Valentine’s Day, scientists had sequenced 78 different SARS-CoV-2 viruses, giving us a reliable look at its “wild type” sequence. In this case, the name indicates a strong similarity to SARS-CoV, a virus resulting in the SARS outbreak of 2002. COVID-19 appears to spread a bit more easily because people in the earliest stage of the disease are more able to walk around than people in the earliest stages of 2002 SARS.

PDB structure 3DDK is an example of an RNA-Dependent RNA Polymerase from Coxsackievirus.

This virus sequence can change rapidly. One of the reasons cells store their genomes in DNA is that DNA is more resistant to mutation than is RNA. Positive-sense RNA viruses employ RNA-dependent RNA polymerase to manufacture new viral genomes, and this enzyme makes proofreading mistakes in about one of every 10,000 nucleotides added to the sequence. Since the sequence of the viral genome is roughly 30,000 nucleotides in length, that means we expect three “letters” to be wrong every time a new genome copy is made. In addition, coronaviruses are noted for their ability to “mix it up” through recombination, a process where different genomes swap segments. Recombination is a mechanism by which a virus can undergo “antigenic shift,” drawing together the worst of multiple strains in a single virus.

Rhinolophus sinicus, the Chinese horseshoe bat, is common in China and as far west as Nepal.

This virus matches sequences we have seen in bats. The publication announcing the genome sequence of this virus noted a strong similarity with Bat-SL-CoVZC45 and Bat-SL-CoVZXC21. We are fortunate that researchers began looking at coronaviruses in wildlife populations in the aftermath of the SARS outbreak. Those carefully curated sequences, along with the information drawn from SARS-CoV itself, provided a base of information against which the SARS-CoV-2 could be compared including protein crystal structures. As we pursue vaccines and symptom-reducing drugs, this information, particularly concerning the cell surface receptor targeted by the virus, will be crucial in stopping its spread and relieving those who have been infected by SARS-CoV-2.

Protein Data Bank 2AJF shows the SARS coronavirus spike receptor-binding domain complexed with its receptor, ACE2.

The New York Times published an excellent guide to the proteins of SARS-CoV-2 in early April, 2020.

I would like to dedicate this post to Bentley Fane, the professor who first fascinated me with virology.

7 thoughts on “On the trail of COVID-19: new virus, familiar family

  1. Anna Belle Leiserson

    Wow, Dave. This is the most helpful thing I’ve read about Covid-19 — in part, I suppose, because I know you’re trustworthy. Many thanks!

    Liked by 1 person

    Reply
    1. dtabb1973 Post author

      I really appreciate the compliment, Anna Belle! I will probably produce a number of science posts this year in support of the DIPLOMICs program. I do enjoy composing my travelogues, though!

      Like

      Reply
  2. Pingback: Rinderpest virus, the scourge of Africa | Picking Up The Tabb

  3. Pingback: How do we test for the virus that causes COVID-19? | Picking Up The Tabb

  4. Vickie Jones

    Thanks David. That is the best information I have found so far about the actual virus. It seems hard to find good scientific information, other than information about basic infection control practices.

    Liked by 1 person

    Reply

Leave a comment