Main

Coronaviruses infect a range of mammalian and avian species1. SARS-CoV-2, the agent of the COVID-19 pandemic2,3, belongs to the Sarbecovirus subgenus of betacoronaviruses, members of which mostly infect bats4,5. Hence, bat coronaviruses were identified as a likely evolutionary precursor of SARS-CoV-2 (refs. 2,3), and the bat virus RaTG13 (refs. 2,6) was identified as the closest known relative of SARS-CoV-2. It is not known how SARS-CoV-2 evolved to infect humans, but two mechanisms have been hypothesized: selection in an animal host before zoonotic transfer (possibly via an intermediate host), or natural selection in humans following direct zoonotic transmission from bats7,8.

The S protein of SARS-CoV-2 mediates attachment of the virus to cell-surface receptors and fusion between virus and cell membranes1. The receptor for SARS-CoV-2, like that for SARS-CoV9,10, is the human cell-surface-membrane protein angiotensin-converting enzyme 2 (ACE2)11,12,13. Membrane-fusion activity, as for other class I fusion glycoproteins14, requires S to be proteolytically cleaved into S1 and S2 subunits that remain associated following cleavage13,15,16,17. In addition to substitutions in the receptor-binding domain (RBD)2, a second difference between the S proteins from human and bat viruses is the presence of a four-amino-acid insertion, PRRA, which adds a furin-cleavage site between the S1 and S2 subunits11. Similar cleavage sites have been found in related coronaviruses, including HKU1 and Middle East respiratory syndrome coronavirus (MERS-CoV), which infect humans16,17,18, and the acquisition of similar cleavage sites is associated with increased pathogenicity in other viruses such as influenza virus19.

In order to examine the evolutionary origin of SARS-CoV-2 and to better understand the emergence of the COVID-19 pandemic, here we have characterized the S proteins of SARS-CoV-2 and RaTG13, determined their affinities for human ACE2 and investigated the effects of furin cleavage on the structure of S from SARS-CoV-2.

Results

Structure of protease-cleaved SARS-CoV-2 S glycoprotein

We first characterized the furin-cleaved S protein of SARS-CoV-2 virus by cryo-electron microscopy (cryo-EM) (Fig. 1a and Table 1). We produced a form of SARS-CoV-2 S protein with the furin-cleavage site intact. This protein, which we expressed in mammalian cells, was secreted in a partially cleaved form, presumably due to the naturally expressed proteases within these cells16 (Extended Data Fig. 1a). We further cleaved this protein using exogenous furin for structural and biochemical characterization (Extended Data Fig. 1a).

Fig. 1: Structure of protease-cleaved SARS-CoV-2 S glycoprotein.
figure 1

Three structures are calculated from micrographs of furin-cleaved material: closed, intermediate and open forms, of approximately equal proportions. In the uncleaved material, most of the population represents the closed form, and a small proportion is in the open conformation. Density maps are shown for the three types of particles, overlaid with a ribbon representation of the built molecular models, viewed with the long axis of the trimer vertical (top); the three monomers are colored blue, yellow and brown. An orthogonal view (bottom) looking down the long axis (indicated by a red dot) is shown. The coloring is as in the top panel, with the NTDs in a lighter hue.

Table 1 Cryo-EM data collection, refinement and validation statistics

The particles analyzed from cryo-electron micrographs fell into three populations: a closed form (34%), an intermediate form (39%) and an open form (27%) with an upright RBD (Fig. 1a). The overall structure of the closed conformation of the S trimer has three-fold symmetry and is similar to structures from uncleaved material, described previously11,20. However, for both this cleaved sample and for the uncleavable form (discussed in the next section), the closed conformation is more compact than the previously published uncleaved closed structure11 (Fig. 1). In the closed conformation, the surface of the RBD, which would interact with the ACE2 receptor, is buried inside the trimer and is not accessible for receptor binding. In the intermediate form (Fig. 1), two of the three RBDs maintain a similar interaction to the closed form, but the third RBD is disordered. In addition, two of the amino-terminal domains (NTDs) that are in closest contact to the disordered RBD have shifted their center of mass by 2.5–2.9 Å from the closed form (in the same direction as the open form) (Extended Data Fig. 2e). In the open form (Fig. 1), two of the RBDs remain closely associated, similar to the closed and intermediate forms. But the third RBD rotates ~60˚ such that the ACE2-interacting surface becomes fully exposed at the top of the assembly, whereas the NTD of the adjacent chain moves toward the rotated domain, with the NTD of the same chain moving away to accommodate this rotation. The changes in domain orientations between the closed and open forms are shown for a selected monomer in Extended Data Fig. 1d.

In this protease-cleaved material, there is a much lower proportion of S protein in a closed conformation: 34% compared with 83% in the uncleaved human S trimer described in the next section, and compared with the 65% and 50% that were reported recently11,20. The observation here of a substantially populated (39%) intermediate form, in which one of the RBDs is disordered and two of NTDs have shifted, suggests that this conformation, which is possibly transient, will lead to a receptor-binding-competent form. Thus, we suspect that, in addition to its requirement for membrane fusion, efficient protease cleavage might be selected to ensure there is a higher proportion of S protein on the virus surface that is capable of binding to receptor. Although the loop containing the cleavage site (residues 676–689) is disordered, in both cleaved and uncleaved forms, the observation of the intermediate form and the much lower thermal stability of the cleaved protein (Extended Data Fig. 1c) discussed in the next section suggest that cleavage reduces the overall stability of S. This reduction in stability may facilitate the movement of the NTDs and the RBDs, enabling, finally, the adoption of the open, receptor-binding-competent form.

Comparison with the S glycoprotein from the bat virus RaTG13

Next, we determined the cryo-EM structures of S from the closest known bat virus (RaTG13) and of uncleaved SARS-CoV-2 S (Extended Data Fig. 1b). The bat virus protein was expressed in mammalian cells, but was found to be unstable during preparation of EM grids and required chemical cross-linking to produce particles for data collection and analysis. The resulting micrographs yielded a high-resolution single-particle reconstruction at 3.1-Å resolution (Table 1). The uncleaved SARS-CoV-2 S was particularly stable and gave rise to the best-quality density maps at 2.6 Å (Table 1 and Extended Data Fig. 2), enabling us to model 15% more of the RBD (100% complete) and 25% more of the NTD (98% complete) than were modeled in earlier studies11,20, influencing the overall appearance of the trimer. The structure of the bat virus S protein is similar to that of the uncleaved SARS-CoV-2 closed form (Fig. 2a,d). It may be that the chemical cross-linking required to obtain the structure of bat virus S is responsible for all particles being in the closed conformation.

Fig. 2: Structural comparison of the S glycoproteins from the bat virus RaTG13 and from SARS-CoV-2.
figure 2

a, The density map for the bat virus trimeric S is shown, with the long axis vertical in the top panel and in an orthogonal view in the bottom panel. All of the particles are in the closed conformation, likely because of the cross-linking of the material. The three monomers are colored blue, yellow and brown. b, Molecular model of the bat virus S protein, colored as in a, with substitutions between the bat virus and SARS-CoV-2 highlighted. Most of the changes are in the RBD and are colored red; there are four substitutions in S1 outside of the RBD, which are shown in green, and a single substitution in S2 is shown in blue. c, Overlay of the molecular structure of a portion of the RBD–RBD interface; the two bat virus S monomers are colored gold (top) and pink (bottom), and the two superposed SARS-CoV-2 S RBD chains are shown in green (top) and blue (bottom). Analysis suggests that the residues at the interface of SARS-CoV-2 S RBD chains support several additional stabilizing interactions and avoid the potential steric repulsion between His505 and His440, seen in the structure of the bat virus. d, The density map for the uncleaved SARS-CoV-2 S protein, in the closed conformation, shown in the same orientation as in a, with the subunits colored blue, green and yellow. This sample gave the best quality maps and enabled the most extensive build of the polypeptide chain.

Comparison of the bat virus S protein sequence with that of SARS-CoV-2 S reveals a high degree of conservation (97.8% in the ectodomain) but with a relatively high proportion of substitutions in the RBD (89.6%) (Fig. 2b). As suggested before11, the substitutions are clustered at two interfaces: the ACE2-receptor-binding surface (considered in the section “Binding of ACE2 to bat virus and SARS-CoV-2 S glycoproteins”) and the RBD–RBD interfaces of the trimeric S. Analysis of the latter interface in the SARS-CoV-2 trimer reveals an extensive network of potential intratrimer hydrogen bonds, including the interaction of Arg403, Gln493 and Tyr505 from one subunit with Ser373, Ser371 and Tyr369 from another (Fig. 2c). The corresponding residues in the bat structure, and other intersubunit contacts, suggest a lower surface complementarity. Of note, the bat virus S protein has an N-glycosylation site at Asn370, where a bulky fucosylated glycan wedges between adjacent domains (Extended Data Fig. 3). Indeed, calculations of surface contact area show that, in the bat virus S trimer, the monomer–monomer interactions account for 5,200 Å2 (of which 485 Å2 is between the RBDs), whereas the equivalent contact area in the closed structure of the SARS-CoV-2 S trimer is 6,100 Å2 (with 550 Å2 between the RBDs). Thermal-stability data show that the uncleaved SARS-CoV-2 S trimer has a markedly higher stability than the bat virus protein does, whereas the cleaved SARS-CoV-2 has a similar stability to the (uncleaved) bat virus protein (Extended Data Fig. 1c). Perhaps the higher stability of SARS-CoV-2 S is required to offset some of the loss of stability that occurs upon cleavage. These structural and biochemical data together suggest that the furin-cleavage site might confer the human virus with an advantage, as the cleavage facilitates a higher proportion of the open, receptor-binding-competent conformation.

Binding of ACE2 to bat virus and SARS-CoV-2 S glycoproteins

As mentioned above, the second region with a high sequence difference between the bat virus and SARS-CoV-2 S RBDs is the receptor-binding site. To quantitate the impact of these differences on binding to the human ACE2 receptor, we measured binding with surface biolayer interferometry. The S protein, either from human or bat viruses, was immobilized onto the surface of a sensor, and purified ACE2 was flowed over the surface to measure binding. Amplitude analysis suggested that SARS-CoV-2 S binds approximately 1,000 times more tightly to ACE2 than the bat virus protein does, with Kd values of <100 nM and >40 μM, respectively (Fig. 3a).

Fig. 3: Binding of ACE2 receptor to bat virus and SARS-CoV-2 S proteins.
figure 3

a, Plot of surface biolayer amplitude measurement as a function of ACE2 concentration with the data for S from SARS-CoV-2 (blue, Kd calculated as 91 ± 18 nM) and from the bat virus (red, Kd estimated to be >40 μM). Kd for the SARS-CoV-2 protein was calculated from kinetic constants (koff = 0.0105 s−1 and kon = 1.56 × 105 m−1 s−1) and was 67.5 ± 9 nM. b,c, Ribbon representation of modeled molecular interactions between ACE2 (green) with RBD from S in SARS-CoV-2 (blue) (both PDB 6VW1)21 and bat virus (red, this study). b, Details of a hydrophobic pocket on ACE2 that accommodates a phenylalanine residue from the SARS-CoV-2 S RBD. c, Two salt bridges and a charged hydrogen bond linking SARS-CoV-2 S RBD to ACE2, while the interface with bat virus S RBD is not able to make these interactions and presents a potential steric clash between Tyr493 and ACE2 Lys31.

Previous studies have determined the structural interaction of the isolated RBD of SARS-CoV-2 S with human ACE2 (refs. 21,22,23). This information (PDB 6VW1)21 enabled us to model and compare the ACE2 domain bound to the RBD domain of our SARS-CoV-2 and bat virus S trimers; it should be noted that, due to conformational plasticity of side chains, analysis of isolated domain can only partially address potential binding interactions. In the case of SARS-CoV-2 S–ACE2, there is a buried surface area of 840 Å2. As well as a series of specific salt and hydrogen bonds, a notable feature is that Phe486 from SARS-CoV-2 S inserts into a hydrophobic pocket on the surface of ACE2 formed by residues including Phe28, Leu79, Met82 and Tyr83. In contrast, in the bat virus S protein, hydrophobic Phe486 is replaced by a less-bulky Leu486 (Fig. 3b), which may account in part for the smaller buried surface of the bat virus S–ACE2 complex of 760 Å2. Structural comparison also suggests another substitution that likely contributes to the greatly enhanced affinity of SARS-CoV-2 S binding to ACE2: Gln493 of S makes a potential hydrogen bond with Glu35 of ACE2, which forms an intramolecular salt bridge with Lys31; in turn, ACE2 Lys31 forms a salt bridge with S Glu484. In contrast, the residue equivalent to SARS-CoV-2 Gln493 in the bat virus S is a tyrosine that is unlikely to bond to ACE2 Glu35, and SARS-CoV-2 Glu484 is replaced by a threonine that would not bond to ACE2 Lys31 (Fig. 3c). Moreover, SARS-CoV-2 Gln498 is replaced by a Tyr498 that cannot form a hydrogen bond to ACE2 Tyr41.

Discussion

Together, our structural and biochemical data indicate that a bat virus, similar to RaTG13, would not be able to bind effectively to human ACE2 receptor and would be unlikely to infect humans directly. Given the modular nature of the human and bat S glycoproteins, and the number and structural locations of the amino-acid-sequence differences between them, our observations support the involvement of recombination8 between distinct coronavirus genomes in the generation of SARS-CoV-2.

The structure of the SARS-CoV-2 S protein presented here is at high resolution and is nearly complete, and has many more external loops included than previously reported structures do, providing important insights for vaccine design. Furthermore, our study suggests that the presence of the furin-cleavage site in the S protein of SARS-CoV-2 facilitates the conformational change required for RBD exposure and binding to surface receptors.

Methods

Design of protein constructs

The constructs corresponding to the ectodomain (residues 1–1208) of SARS-CoV-2 S protein (NCBI reference sequence YP_009724390.1) and the ectodomain (residues 1–1204) of bat RaTG13 S protein (QHR63300.2) were both codon-optimized for human expression, synthesized and cloned into pcDNA.3.1(+) vector by GenScript with an N-terminal µ-phosphatase secretion leader sequence and a carboxy-terminal hexa-histidine tag preceded by a foldon trimerization tag and a TEV-cleavage site, all separated by short glycine-rich linkers. Both constructs were made as ‘2P’ mutants (K986P and V987P for YP_009724390, and K982P and V983P for QHR63300.2) for increased yield and to prevent the proteins from assuming the post-fusion conformation24.

The ectodomain (residues 19–615) of human ACE2 (NM_021804.2) was optimized for human expression, synthesized and cloned into pcDNA.3.1(+) vector by GenScript with an N-terminal Ig-kappa chain secretion leader sequence and a C-terminal Twin-Strep tag preceded by a DYK tag.

Protein expression and purification

Proteins were expressed in Expi293F cells (Gibco) cultured in suspension in humidified, 8% CO2 atmosphere, at 37 °C with shaking at 125 r.p.m. Cell cultures were grown in FreeStyle 293 Expression Medium to a density of 3 million cells per ml at the time of transfection and transfected with 1 mg of DNA per liter of culture and ExpiFectamine 293 (Gibco) according to the manufacturer’s instructions. The supernatants were collected twice: after 3–4 and 6–7 days, and were clarified, filtered and incubated with appropriate affinity resin.

S proteins were bound to 5–7 ml of TALON cobalt beads (Takara) per liter of culture, washed briefly and eluted with imidazole. ACE2 was bound to 4–6 ml of Strep-Tactin XT resin (iba) per litre of cell-culture supernatant, which had been pretreated with the BioLock solution (iba). The beads were briefly washed, and the protein was eluted with Strep-Tactin XT Elution Buffer BXT (iba). All proteins were then concentrated and either flash-frozen or gel-filtered on a Superdex 200 Increase 10/300 GL column (GE Life Sciences) into a buffer containing 20 mM Tris pH 8.0 and 150 mM NaCl.

Furin treatment

Recombinant furin (New England Biolabs) was used to cleave the SARS-CoV-2 S protein. Two units of the enzyme were used per 25 µg of the S protein, and the reaction was performed at 25 °C in the presence of 1 mM CaCl2 and was stopped by addition of 4 mM EDTA. SDS–PAGE was used to track the progression of the reaction.

Thermal-stability measurements

Protein melting temperatures were measured using Differential Scanning Fluorimetry. Twenty-microliter reactions consisted of 5 µg protein with SYPRO Orange (Sigma) present at a 5× concentration, diluted from the 5,000× concentrate. Fluorescence was measured between 25 °C and 95 °C every 0.5 °C in 140 cycles, using an Agilent Stratagene Mx3005P. Each experiment was repeated at least three times.

Biolayer interferometry

Human ACE2 binding to coronavirus S proteins was measured on an Octet Red 96 instrument (ForteBio) in the buffer containing 20 mM Tris pH 8.0 and 150 mM NaCl, at 25 °C, with shaking at 1,000 r.p.m. NiNTA (NTA, ForteBio) sensors were used with the bat and furin-uncleavable human S proteins. The sensors were pre-equilibrated in the buffer, and S proteins were immobilized on them at 15–30 µg ml–1 for 30–40 min. ACE2 binding was measured using a 3- to 5-min association step followed by a 10- to 15-min dissociation step. Each experiment was repeated at least three times. Association phases were analyzed as a single exponential function, and plots of the observed rate (kobs) versus ACE2 concentration gave the association and dissociation rate constants (kon and koff) as the slope and intercept, respectively. Kd was determined as koff/kon and, where possible, independently by analysis of the variation of maximum response with ACE2 concentration.

Cryo-EM sample preparation and data collection

The furin-uncleavable human S was frozen at 0.4 mg ml–1 in a buffer (20 mM Tris pH 8.0 and 150 mM NaCl) supplemented with 0.1 % OG on an R2/2 200 mesh Quantifoil grid. Extensive buffer optimization and cross-linking had to be performed to obtain RaTG13 S protein suitable for cryo-EM. The protein in MES buffer (50 mM MES pH 6.0, 100 mM NaCl) was treated with BS3 (Thermo Scientific) at a final concentration of 0.5 mM on ice for an hour, and then a Grafix protocol25 was performed to achieve double cross-linking: the BS3-cross-linked sample was loaded on a 10–30% glycerol and 0–0.15% glutaraldehyde gradient containing 50 mM MES pH 6.0, 100 mM NaCl and was spun at 35,000 r.p.m. in a SW41 rotor (Beckmann) for 20 h at 4 °C. The reaction was then quenched with a final concentration of 90 mM Tris pH 7.5, and fractions containing cross-linked protein identified with SDS–PAGE were pooled, concentrated and gel-filtered into 50 mM MES pH 6.0, 100 mM NaCl. The resulting bat S protein and the furin-treated human S were frozen at a concentration of 0.1 mg ml–1 on R2/2 200 mesh Quantifoil grids coated with a thin layer of continuous carbon. All grids were freshly glow discharged for 30 s at 25 mA prior to freezing. All samples were prepared by applying 4 µl of a sample to a grid equilibrated to 4 °C in 100% humidity, followed by a 4- to 5-s blot using a Vitrobot MkIII and plunge freezing into liquid ethane.

Data were collected using EPU software (Thermo Scientific) on Thermo Scientific Titan Krios microscopes operating at 300 kV. For the furin-uncleavable SARS-CoV-2 S dataset, the micrographs were collected using Falcon 3 detector (Thermo Scientific) operating in electron-counting mode. Exposures were 60 s with a total dose of 33.6 e/Å2, fractionated into 30 frames, with a calibrated pixel size of 1.09 Å. For the RaTG13 S and furin-treated SARS-CoV-2 S datasets, micrographs were collected using a Gatan K2 detector mounted on a Gatan GIF Quantum energy filter operating in zero-loss mode with a slit width of 20 eV. Exposures were 8 s with a total dose of 54.4 e/Å2, fractionated into 32 frames, with a calibrated pixel size of 1.08 Å. All datasets were collected using defoci between 1.5 and 3 µm.

Cryo-EM data processing

Movie frames were aligned using MotionCor2 (ref. 26) implemented in RELION27 and Contrast Transfer Function fitted using CTFfind4 (ref. 28). All subsequent data processing was carried out using both RELION and cryoSPARC29. Particles for the furin-uncleavable SARS-CoV-2 S dataset were picked using crYOLO30 with a model trained on manually picked micrographs. For the datasets on the carbon support, particles were picked using RELION auto-picking. All datasets were subject to two rounds of RELION two-dimensional classification, retaining classes with clear secondary structure. These particles were classified using RELION three-dimensional classification with initial models generated using ab initio reconstructions in cryoSPARC. The details of these classifications for each of the three datasets are detailed in Extended Data Fig. 4.

Final refinements were carried out using cryoSPARC homogeneous refinement for all models except the intermediate conformation, which was refined using RELION. Local resolution was estimated using blocres31 implemented in cryoSPARC. Maps were filtered by local resolution and globally sharpened32 in cryoSPARC. Additional information is available in Table 1 and Extended Data Fig. 5.

Model building

The model for the uncleavable SARS-CoV-2 S protein in the closed conformation was started using the published structure (PDB 6VXX). The model was fitted to the density, and extra regions were manually built using Coot33. This model of the uncleavable SARS-CoV-2 structure in the closed conformation was then used as the basis for building the RaTG13 structure, which was mutated at the relevant residues in Coot. Both models were real-space refined and validated using PHENIX34.

The intermediate and open structures were generated by fitting the uncleavable closed human protein to the density. The open structure required manual erection of the RBD in Coot. Both the open and intermediate models were refined using Namdinator35, followed by geometry normalization using PHENIX. Additional information is available in Table 1.

Reporting Summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.