{"id":4077,"date":"2020-03-06T18:05:45","date_gmt":"2020-03-06T09:05:45","guid":{"rendered":"http:\/\/www.bicyclogy.com\/?p=4077"},"modified":"2021-04-24T22:58:52","modified_gmt":"2021-04-24T13:58:52","slug":"2020%e5%b9%b43%e6%9c%886%e6%97%a5","status":"publish","type":"post","link":"https:\/\/www.bicyclogy.com\/?p=4077","title":{"rendered":"2020\u5e743\u67086\u65e5"},"content":{"rendered":"<p><!--fbbulkpostkeys=\"1583485545,\u672a\u5206\u985e,2020\u5e743\u67086\u65e5\"--><!--fbbulkpostcomment=\"NoTitle \"-->RESEARCH ARTICLE<br \/>\nMICROBIOLOGY<br \/>\nOn the origin and continuing evolution of SARS-CoV-2<br \/>\nXiaolu Tang1,7<br \/>\n, Changcheng Wu1,7<br \/>\n, Xiang Li2,3,4,7<br \/>\n, Yuhe Song2,5,7<br \/>\n, Xinmin Yao1<br \/>\n, Xinkai Wu1<br \/>\n,<br \/>\nYuange Duan1<br \/>\n, Hong Zhang1<br \/>\n, Yirong Wang1<br \/>\n, Zhaohui Qian6<br \/>\n, Jie Cui2,3,*, and Jian Lu1,*<br \/>\n1. State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics,<br \/>\nSchool of Life Sciences, Peking University, Beijing, 100871, China<br \/>\n2. CAS Key Laboratory of Molecular Virology &amp; Immunology, Institut Pasteur of Shanghai,<br \/>\nChinese Academy of Sciences, China<br \/>\n3. Center for Biosafety Mega-Science, Chinese Academy of Sciences, China<br \/>\n4. University of Chinese Academy of Sciences, China<br \/>\n5. School of Life Sciences, Shanghai University, China<br \/>\n6. NHC Key Laboratory of Systems Biology of Pathogens, Institute of Pathogen Biology,<br \/>\nChinese Academy of Medical Sciences and Peking Union Medical College, Beijing<br \/>\n7. These authors contributed equally to this work.<br \/>\n*Corresponding authors:<br \/>\nJian Lu, Email: LUJ@pku.edu.cn<br \/>\nJie Cui, Email: jcui@ips.ac.cn<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nABSTRACT<br \/>\nThe SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China, and has since<br \/>\nimpacted a large portion of China and raised major global concern. Herein, we investigated<br \/>\nthe extent of molecular divergence between SARS-CoV-2 and other related coronaviruses.<br \/>\nAlthough we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a<br \/>\nbat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was<br \/>\n17%, suggesting the divergence between the two viruses is much larger than previously<br \/>\nestimated. Our results suggest that the development of new variations in functional sites in the<br \/>\nreceptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin<br \/>\nSARSr-CoVs are likely caused by mutations and natural selection besides recombination.<br \/>\nPopulation genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses<br \/>\nevolved into two major types (designated L and S), that are well defined by two different<br \/>\nSNPs that show nearly complete linkage across the viral strains sequenced to date. Although<br \/>\nthe L type (~70%) is more prevalent than the S type (~30%), the S type was found to be the<br \/>\nancestral version. Whereas the L type was more prevalent in the early stages of the outbreak<br \/>\nin Wuhan, the frequency of the L type decreased after early January 2020. Human<br \/>\nintervention may have placed more severe selective pressure on the L type, which might be<br \/>\nmore aggressive and spread more quickly. On the other hand, the S type, which is<br \/>\nevolutionarily older and less aggressive, might have increased in relative frequency due to<br \/>\nrelatively weaker selective pressure. These findings strongly support an urgent need for<br \/>\nfurther immediate, comprehensive studies that combine genomic data, epidemiological data,<br \/>\nand chart records of the clinical symptoms of patients with coronavirus disease 2019<br \/>\n(COVID-19).<br \/>\nKeywords: SARS-CoV-2, virus, molecular evolution, population genetics<br \/>\nReceived: 25-Feb-2020; Revised: 28-Feb-2020; Accepted: 29-Feb-2020.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nINTRODUCTION<br \/>\nThe coronavirus disease 2019 (COVID-19) epidemic started in late December 2019 in Wuhan,<br \/>\nthe capital of Central China&#8217;s Hubei Province. Since then, it has rapidly spread across China<br \/>\nand in other countries, raising major global concerns. The etiological agent is a novel<br \/>\ncoronavirus, SARS-CoV-2, named for the similarity of its symptoms to those induced by the<br \/>\nsevere acute respiratory syndrome. As of February 28, 2020, 78,959 cases of SARS-CoV-2<br \/>\ninfection have been confirmed in China, with 2,791 deaths. Worryingly, there have also been<br \/>\nmore than 3,664 confirmed cases outside of China in 46 countries and areas<br \/>\n(<a href=\"https:\/\/www.who.int\/emergencies\/diseases\/novel-coronavirus-2019\/situation-reports\/),\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.who.int\/emergencies\/diseases\/novel-coronavirus-2019\/situation-reports\/),<\/a><br \/>\nraising significant doubts about the likelihood of successful containment. Further, the<br \/>\ngenomic sequences of SARS-CoV-2 viruses isolated from a number of patients share<br \/>\nsequence identity higher than 99.9%, suggesting a very recent host shift into humans [1-3].<br \/>\nCoronaviruses are naturally hosted and evolutionarily shaped by bats [4, 5]. Indeed, it has<br \/>\nbeen postulated that most of the coronaviruses in humans are derived from the bat reservoir [6,<br \/>\n7]. Unsurprisingly, several teams have recently confirmed the genetic similarity between<br \/>\nSARS-CoV-2 and a bat betacoronavirus of the sub-genus Sarbecovirus [8-13]. The<br \/>\nwhole-genome sequence identity of the novel virus has 96.2% similarity to a bat<br \/>\nSARS-related coronavirus (SARSr-CoV; RaTG13) collected in Yunnan province, China [2,<br \/>\n14], but is not very similar to the genomes of SARS-CoV (about 79%) or MERS-CoV (about<br \/>\n50%) [1, 15]. It has also been confirmed that the SARS-CoV-2 uses the same receptor, the<br \/>\nangiotensin converting enzyme II (ACE2), as the SARS-CoV [11]. Although the specific<br \/>\nroute of transmission from natural reservoirs to humans remains unclear [5, 13], several<br \/>\nstudies have shown that pangolins may have provided a partial spike gene to SARS-CoV-2;<br \/>\nthe critical functional sites in the spike protein of SAR-CoV-2 are nearly identical to one<br \/>\nidentified in a virus isolated from a pangolin [16-18].<br \/>\nDespite these recent discoveries, several fundamental issues related to the evolutionary<br \/>\npatterns and driving forces behind this outbreak of SARS-CoV-2 remain unexplored [19].<br \/>\nHerein, we investigated the extent of molecular divergence between SARS-CoV-2 and other<br \/>\nrelated coronaviruses and carried out population genetic analyses of 103 sequenced genomes<br \/>\nof SARS-CoV-2. This work provides new insights into the factors driving the evolution of<br \/>\nSARS-CoV-2 and its pattern of spread through the human population.<br \/>\nRESULTS<br \/>\nMolecular phylogeny and divergence between SARS-CoV-2 and related coronaviruses.<br \/>\nFor each annotated ORF in the reference genome of SARS-CoV-2 (NC_045512), we<br \/>\nextracted the orthologous sequences in human SARS-CoV, four bat<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nSARS-related coronaviruses (SARSr-CoV: RaTG13, ZXC21, ZC45, and BM48-31), one<br \/>\nPangolin SARSr-CoV from Guangdong (GD) [17], and six Pangolin SARSr-CoV genomes<br \/>\nfrom Guangxi (GX) [18] (Table S1). We aligned the coding sequences (CDSs) based on the<br \/>\nprotein alignments (see Materials and Methods). Most ORFs annotated from SARS-CoV-2<br \/>\nwere found to be conserved in other viruses, except for ORF8 and ORF10 (Table 1). The<br \/>\nprotein sequence of SARS-CoV-2 ORF8 shared very low similarity with sequences in<br \/>\nSARS-CoV and BM48-31, and ORF10 had a premature stop codon in both SARS-CoV and<br \/>\nBM48-31 (Fig. S1). A one-base deletion caused a frame-shift mutation in ORF10 of ZXC21<br \/>\n(Fig. S1).<br \/>\nTo investigate the phylogenetic relationships between these viruses at the genomic scale, we<br \/>\nconcatenated coding regions (CDSs) of the nine conserved ORFs (orf1ab, E, M, N, S, ORF3a,<br \/>\nORF6, ORF7a, and ORF7b) and reconstructed the phylogenetic tree using the synonymous<br \/>\nsites (Fig. 1A). We also used CODEML in the PAML [20] to infer the ancestral sequence of<br \/>\neach node and calculated the dN (nonsynonymous substitutions per nonsynonymous site), dS<br \/>\n(synonymous substitutions per synonymous site), and dN\/dS (\u03c9) values for each branch (Fig.<br \/>\n1A). In parallel, we also calculated the pairwise dN, dS, and \u03c9 values between SARS-CoV-2<br \/>\nand another virus (Table 1).<br \/>\nThe genome-wide phylogenetic tree indicated that SARS-CoV-2 was closest to RaTG13,<br \/>\nfollowed by GD Pangolin SARSr-CoV, then by GX Pangolin SARSr-CoVs, then by ZC45<br \/>\nand ZXC21, then by human SARS-CoV, and finally by BM48-31(Fig. 1A). Notably, we<br \/>\nfound that the nucleotide divergence at synonymous sites between SARS-CoV-2 and other<br \/>\nviruses was much higher than previously anticipated. For example, although the overall<br \/>\ngenomic nucleotides overall differ ~4% between SARS-CoV-2 and RaTG13, the genomic<br \/>\naverage dS was 0.17, which means the divergence at the neutral sites is 17% between these<br \/>\ntwo viruses (Table 1). This is because the nonsynonymous sites are usually under stronger<br \/>\nnegative selection than synonymous sites, and calculating sequence differences without<br \/>\nseparating these two classes of sites may underestimate the extent of molecular divergence by<br \/>\nseveral folds.<br \/>\nNotably, the dS value varied considerably across genes in SARS-CoV-2 and the other viruses<br \/>\nanalyzed. In particular, the spike gene (S) consistently exhibited larger dS values than other<br \/>\ngenes (Table 1). This pattern became clear when we calculated the dS value for each branch<br \/>\nin Fig. 1A for the spike gene versus the concatenated sequences of the remaining genes (Fig.<br \/>\nS2). In each branch, the dS of spike was 2.22 \u00b1 1.35 (mean \u00b1 SD) times as large as that of the<br \/>\nother genes. This extremely elevated dS value of spike could be caused either by a high<br \/>\nmutation rate or by natural selection that favors synonymous substitutions. Synonymous<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nsubstitutions may serve as another layer of genetic regulation, guiding the efficiency of<br \/>\nmRNA translation by changing codon usage [21]. If positive selection is the driving force for<br \/>\nthe higher synonymous substation rate seen in spike, we expect the frequency of optimal<br \/>\ncodons (FOP) of spike to be different from that of other genes. However, our codon usage<br \/>\nbias analysis (Table S2) suggests the FOP of spike was only slightly higher than that of the<br \/>\ngenomic average (0.717 versus 0.698, see Materials and Methods). Thus, we believe that the<br \/>\nelevated synonymous substitution rate measured in spike is more likely caused by higher<br \/>\nmutational rates; however, the underlying molecular mechanism remains unclear.<br \/>\nBoth SARS-CoV and SARS-CoV-2 bind to ACE2 through the RBD of spike protein in order<br \/>\nto initiate membrane fusion and enter human cells [1, 2, 22-26]. Five out of the six critical<br \/>\namino acid (AA) residues in RBD were different between SARS-CoV-2 and SARS-CoV (Fig.<br \/>\n1B), and a 3D structural analysis indicated that the spike of SARS-CoV-2 has a higher<br \/>\nbinding affinity to ACE2 than SARS-CoV [23]. Intriguingly, these same six critical AAs are<br \/>\nidentical between GD Pangolin-CoV and SARS-CoV-2 [16]. In contrast, although the<br \/>\ngenomes of SARS-CoV-2 and RaTG13 are more similar overall, only one out of the six<br \/>\nfunctional sites are identical between the two viruses (Fig. 1B). It has been proposed that the<br \/>\nSARS-CoV-2 RBD region of the spike protein might have resulted from recent recombination<br \/>\nevents in pangolins [16-18]. Although several ancient recombination events have been<br \/>\ndescribed in spike [27, 28], it also seems likely that the identical functional sites in<br \/>\nSARS-CoV-2 and GD Pangolin-CoV may actually the result of coincidental convergent<br \/>\nevolution [18].<br \/>\nIf the functional AA residues in the SARS-CoV-2 RBD region were acquired from GD<br \/>\nPangolin-CoV in a very recent recombination event, we would expect the nucleotide<br \/>\nsequences of this region to be nearly identical between the two viruses. However, for the CDS<br \/>\nsequences that span five critical AA sites in the SARS-CoV-2 spike (ranging from codon 484<br \/>\nto 507, covering five adjacent functional sites: F486, Q493, S494, N501, and Y505; Fig. S3),<br \/>\nwe estimated dS = 0.411, dN = 0.019, and \u03c9= 0.046 between SARS-CoV-2 and GD<br \/>\nPangolin-CoV. By assuming the synonymous substitution rate (u) of 1.67-4.67 x 10-3<br \/>\n\/site\/year,<br \/>\nas estimated in SARS-CoV [29], the recombination\/introgression, if it occurred at all, would<br \/>\nbe estimated to happen approximately 19.8-55.4 years ago. Here, the formula<br \/>\nwas used to calculate divergence time; note that the increased mutational rate of<br \/>\nspike was considered for this calculation. Thus, it seems very unlikely that SARS-CoV-2<br \/>\noriginated from the GD Pangolin-CoV due to a very recent recombination event.<br \/>\nAlternatively, it seems more likely that a high mutation rate in spike, coupled with strong<br \/>\nnatural selection, has shaped the identical functional AA residues between these two viruses,<br \/>\nas proposed previously [18]. Although these sites are maintained in SARS-CoV-2 and GD<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nPangolin-CoV, mutations may have changed the residues in the RaTG13 lineage after it<br \/>\ndiverged from SARS-CoV-2 (the blue arrow in Fig. 1A). In summary, it seems that the shared<br \/>\nidentity of critical AA sites between SARS-CoV-2 and GD Pangolin-CoV might be due to<br \/>\nrandom mutations coupled with natural selection, and not necessarily recombination.<br \/>\nSelective constraints and positive selection during the evolution of SARS-CoV-2 and<br \/>\nrelated coronaviruses<br \/>\nThe genome-wide \u03c9 value between SARS-CoV-2 and other viruses ranged from 0.044 to<br \/>\n0.124 (Table 1), indicative of strong negative selection on the nonsynonymous sites. In other<br \/>\nwords, 87.6% to 95.6% of the nonsynonymous mutations were removed by negative selection<br \/>\nduring viral evolution. To determine the extent of positive selection, we concatenated the<br \/>\nCDS sequences of 9 conserved ORFs in all the viruses in Fig. 1A and fitted the M7 (beta:<br \/>\nneutral and negative selection) and M8 (beta + \u03c9&gt;1:neutral, negative selection, and positive<br \/>\nselection) model using CODEML (Materials and Methods). The M8 model (lnL =<br \/>\n-104,813.732, np =18) was a significantly better fit than the M7 (lnL = -105,063.284, np = 16)<br \/>\nmodel (P &lt; 10-10), suggesting that some AA substitutions were favored by positive Darwinian<br \/>\nselection (but not necessarily in the SARS-CoV-2 lineage).Under the M8 model, 98.48% (p0)<br \/>\nof the nonsynonymous substitutions were estimated under neutral evolution or purifying<br \/>\nselection (0\u2a7d\u03c9\u2a7d1), and 1.52% (p1) of the nonsynonymous substitutions were under positive<br \/>\nselection (\u03c9 = 1.50). A Bayes Empirical Bayes (BEB) analysis suggested that 10 AA sites<br \/>\nshowed strong signals of positive selection, and, interestingly, three of those were located in<br \/>\nthe RBD of spike, including at one critical site (Fig. 1C and Fig. S4). Thus, although these<br \/>\ncoronaviruses were generally under very strong negative selection, positive selection was also<br \/>\nresponsible for the evolution of protein sequences. The putatively positively-selected sites<br \/>\nmight serve as candidates for further functional studies.<br \/>\nMutations in 103 SARS-CoV-2 genomes<br \/>\nWe downloaded 103 publicly available SARS-CoV-2 genomes, aligned the sequences, and<br \/>\nidentified the genetic variants. For ease of visualization, we marked each virus strain based on<br \/>\nthe location and date the virus was isolated with the format of &#8220;Location_Date\u201d throughout<br \/>\nthis study (see Table S1 for details; Each ID did not contain information of the patient&#8217;s race<br \/>\nor ethnicity). Although SARS-CoV-2 is an RNA virus, for simplicity, we presented our<br \/>\nresults based on DNA sequencing results throughout this study (i.e., the nucleotide T<br \/>\n(thymine) means U (uracil) in SARS-CoV-2). For each variant, the ancestral state was<br \/>\ninferred based on the genome and CDS alignments of SARS-CoV-2 (NC_045512), RaTG13,<br \/>\nand GD Pangolin-CoV (Materials and Methods). In total, we identified mutations in 149 sites<br \/>\nacross the 103 sequenced strains. Ancestral states for 43 synonymous, 83 non-synonymous,<br \/>\nand two stop-gain mutations were unambiguously inferred. The frequency spectra of<br \/>\nsynonymous and nonsynonymous mutations are shown in Fig. 2.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nMost derived mutations were singletons (67.4% (29\/43) of synonymous mutations and 84.3%<br \/>\n(70\/83) of nonsynonymous mutations), indicating either a recent origin [30] or population<br \/>\ngrowth [31]. In general, the derived alleles of synonymous mutations were significantly<br \/>\nskewed towards higher frequencies than those of nonsynonymous ones (P &lt; 0.01, Wilcoxon<br \/>\nrank-sum test; Fig. 2), suggesting the nonsynonymous mutations tended to be selected against.<br \/>\nHowever, 16.3% (7 out of 43) synonymous mutations, and one nonsynonymous (ORF8<br \/>\n(L84S, 28,144)) mutation had a derived frequency of \u2265 70% across the SARS-CoV2 strains.<br \/>\nThe nonsynonymous mutations that had derived alleles in at least two SARS-CoV-2 strains<br \/>\naffected six proteins: orf1ab (A117T, I1607V, L3606F, I6075T), S (H49Y, V367F), ORF3a<br \/>\n(G251V), ORF7a (P34S), ORF8 (V62L, S84L), and N (S194L, S202N, P344S).<br \/>\nTwo major types of SARS-CoV-2 are defined by two SNPs that show complete linkage<br \/>\nTo detect the possible recombination among SARS-CoV2 viruses, we used Haploview [32] to<br \/>\nanalyze and visualize the patterns of linkage disequilibrium (LD) between variants with minor<br \/>\nalleles in at least two SARS-CoV-2 strains (Fig. 3A). Since most mutations were at very low<br \/>\nfrequencies, it is not surprising that many pairs had a very low r<br \/>\n2<br \/>\nor LOD value (Fig. 3B-C).<br \/>\nConsistent with another recent report [31], we did not find evidence of recombination<br \/>\nbetween the SARS-CoV2 strains.<br \/>\nHowever, we found that SNPs at location 8,782 (orf1ab: T8517C, synonymous) and 28,144<br \/>\n(ORF8: C251T, S84L) showed significant linkage, with an r<br \/>\n2<br \/>\nvalue of 0.954 (Fig. 3B, red)<br \/>\nand a LOD value of 50.13 (Fig. 3C, red). Among the 103 SARS-CoV-2 virus strains, 101 of<br \/>\nthem exhibited complete linkage between the two SNPs: 72 strains exhibited a \u201cCT\u201d<br \/>\nhaplotype (defined as \u201cL\u201d type because T28,144 is in the codon of Leucine) and 29 strains<br \/>\nexhibited a \u201cTC\u201d haplotype (defined as \u201cS\u201d type because C28,144 is in the codon of Serine)<br \/>\nat these two sites. Thus, we categorized the SARS-CoV-2 viruses into two major types, with<br \/>\nL being the major type (~70%) and S being the minor type (~30%).<br \/>\nThe evolutionary history of L and S types of SARS-CoV-2<br \/>\nAlthough we defined the L and S types based on two tightly linked SNPs, strikingly, the<br \/>\nseparation between the L (blue) and S (red) types was maintained when we reconstructed the<br \/>\nhaplotype networks using all the SNPs in the SARS-CoV-2 genomes (Fig. 4A; the number of<br \/>\nmutations between two neighboring haplotypes was inferred parsimoniously). This analysis<br \/>\nfurther supports the idea that the two linked SNPs at sites 8,782 and 28,144 adequately define<br \/>\nthe L and S types of SARS-CoV-2.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nTo determine whether L or S type is ancestral, we examined the genomic alignments of<br \/>\nSARS-CoV-2 and other highly related viruses. Strikingly, nucleotides of the S type at sites<br \/>\n8,782 and 28,144 were identical to the orthologous sites in the most closely related viruses<br \/>\n(Fig. 4B). Remarkably, both sites were highly conserved in other viruses as well. Hence,<br \/>\nalthough the L type (~70%) was more prevalent than the S type (~30%) in the SARS-CoV-2<br \/>\nviruses we examined, the S type is actually the ancestral version of SARS-CoV-2.<br \/>\nTo further examine the relationship among the strains in the L and S types, we reconstructed a<br \/>\nphylogenetic tree of all the 103 SARS-CoV-2 viruses based on their whole-genome sequences.<br \/>\nOur phylogenetic tree also clearly shows the separation of the two types (Fig. 5). Viruses of<br \/>\nthe L type (blue) first clustered together, and likewise, viruses of the S type (red) were also<br \/>\nmore closely related to each other. Therefore, our whole-genome comparisons further confirm<br \/>\nthe separation of the L and S types.<br \/>\nThus far, we found that, although the L type is derived from the S type, L (~70%) is more<br \/>\nprevalent than S (~30%) among the sequenced SARS-CoV-2 genomes we examined. This<br \/>\npattern suggests that L has a higher transmission rate than the S type. Furthermore, our<br \/>\nmutational load analysis indicated that the L type had accumulated a significantly higher<br \/>\nnumber of derived mutations than S type (P &lt; 0.0001, Wilcoxon rank-sum test; Fig. S5). We<br \/>\npropose that, although the L type newly evolved from the ancient S type, it transmits faster or<br \/>\nreplicates faster in human populations, causing it to accumulate more mutations than the S<br \/>\ntype. Thus, our results suggest the L might be more aggressive than the S type due to the<br \/>\npotentially higher transmission and\/or replication rates.<br \/>\nTo test whether the two types of SARS-CoV-2 had differences in temporal and spatial<br \/>\ndistributions, we stratified the viruses based on the locations and dates they were isolated<br \/>\n(Table S1). Among the 27 viruses isolated from Wuhan, 26 (96.3%) were L type, and only 1<br \/>\n(3.7%) was S type. However, among the other 73 viruses isolated outside Wuhan, 45 (61.6%)<br \/>\nwere L type, and 28 (38.4%) were S type. This comparison suggests that the L type is<br \/>\nsignificantly more prevalent in Wuhan than in other places (P = 0.0004, Fisher\u2019s exact test,<br \/>\nFig. 6 and Table S3). All of the 26 samples isolated before January 7, 2020, were from<br \/>\nWuhan, and among the 74 samples collected from January 7, 2020, only one was from<br \/>\nWuhan, 33 were from other places in China, and 40 were from patients outside China. Thus,<br \/>\nit is not surprising that the L type was significantly more prevalent before January 7, 2020<br \/>\n(96.2%, 25 L and 1 S) than after January 7, 2020 (62.2%, 46 L and 28 S) (P = 0.0008,<br \/>\nFisher\u2019s exact test, Fig. 6 and Table S3).<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nIf the L type is more aggressive than the S type, why did the relative frequency of the L type<br \/>\ndecrease compared to the S type in other places after the initial breakout in Wuhan? One<br \/>\npossible explanation is that, since January 2020, the Chinese central and local governments<br \/>\nhave taken rapid and comprehensive prevention and control measures. These human<br \/>\nintervention efforts might have caused severe selective pressure against the L type, which<br \/>\nmight be more aggressive and spread more quickly. The S type, on the other hand, might have<br \/>\nexperienced weaker selective pressure by human intervention, leading to an increase in its<br \/>\nrelative abundance among the SARS-CoV-2 viruses. Thus, we hypothesized that the two<br \/>\ntypes of SARS-CoV-2 viruses might have experienced different selective pressures due to<br \/>\ndifferent epidemiological features. Of note, the above analyses were based on very patchy<br \/>\nSARS-CoV-2 genomes that were collected from different locations and time points. More<br \/>\ncomprehensive genomic data is required for further testing of our hypothesis.<br \/>\nHeteroplasmy of SARS-CoV-2 viruses in patients<br \/>\nIt is currently unclear how the L type specifically evolved from the S type during the<br \/>\ndevelopment of SARS-CoV-2. However, we found that the sequence of viruses isolated from<br \/>\none patient that lived in the United States on January 21 (USA_2020\/01\/21.a, GISAID ID:<br \/>\nEPI_ISL_404253) had the genotype Y (C or T) at both positions 8,782 and 28,144, differing<br \/>\nfrom the general trend of having either C or T. Although novel mutations could lead to this<br \/>\nresult, the most parsimonious explanation is that this patient may have been infected by both<br \/>\nthe L and S types (Fig. 7A). The sample of USA_2020\/01\/21.a was collected from a<br \/>\n63-year-old female patient living in Chicago (from GISAID). Based on the report from the<br \/>\nUnited States Centers for Disease Control and Prevention<br \/>\n(<a href=\"https:\/\/www.cdc.gov\/media\/releases\/2020\/p0124-second-travel-coronavirus.html),\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.cdc.gov\/media\/releases\/2020\/p0124-second-travel-coronavirus.html),<\/a> we<br \/>\ninferred this patient returned to the United States from Wuhan on January 13, 2020. However,<br \/>\nwhether the co-existence of L and S types in this patient was due to multiple-time infections<br \/>\nduring her visit to Wuhan is currently unclear. Notably, the viruses identified from a patient<br \/>\nin Australia on January 28, 2020 (Australia_2020\/01\/28.a, GISAID ID: EPI_ISL_407894)<br \/>\nhad multiple degenerate nucleotides. This sample was collected from a 44-year-old male<br \/>\npatient in Gold Cost, Australia (from GISAID). Based on the report from the Courier Mail<br \/>\n(January 30, 2020), we inferred this patient had the history of traveling from Wuhan to the<br \/>\nGold Coast before the diagnosis of infection. As shown in Fig. 7B, we inferred this patient<br \/>\nmight have been infected by at least two different strains of SARS-CoV-2 (Fig. 7B).<br \/>\nTo further investigate the heteroplasmy of SARS-CoV-2 viruses in patients, we searched 12<br \/>\ndeep-sequencing libraries of SARS-CoV-2 genomes that were deposited in the Sequence<br \/>\nRead Archive (SRA) (Table S4, Materials and Methods). We found 17 genomic sites that<br \/>\nshowed evidence of heteroplasmy of SARS-CoV-2 virus in five patients, but we did not find<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nany other instances of the co-existence of L and S types in any patient (Table 2). These<br \/>\nfindings evince the developing complexity of the evolution of SARS-CoV-2 infections.<br \/>\nFurther studies investigating how the different alleles of SARS-CoV-2 viruses compete with<br \/>\neach other will be of significant value.<br \/>\nDISCUSSION<br \/>\nIn this study, we investigated the patterns of molecular divergence between SARS-CoV-2 and<br \/>\nother related coronaviruses. Although the genomic analyses suggested that SARS-CoV-2 was<br \/>\nclosest to RaTG13, their difference at neutral sites was much higher than previously realized.<br \/>\nOur results provide novel insights into tracing the intermediate natural host of SARS-CoV-2.<br \/>\nWith population genetic analyses of 103 genomes of SARS-CoV-2, we found that<br \/>\nSARS-CoV-2 viruses evolved into two major types (L and S types), and the two types were<br \/>\nwell defined by just two SNPs that show nearly complete linkage across SARS-CoV-2 strains.<br \/>\nAlthough the L type (~70%) was more prevalent than the S type (~30%) in the SARS-CoV-2<br \/>\nviruses we examined, our evolutionary analyses suggested the S type was most likely the<br \/>\nmore ancient version of SARS-CoV-2. Our results also support the idea that the L type is<br \/>\nmore aggressive than the S type.<br \/>\nSince nonsynonymous sites are usually under stronger negative selection than synonymous<br \/>\nsites, calculating sequence differences without separating these two classes of sites could lead<br \/>\nto a potentially significant underestimate of the degree of molecular divergence. For example,<br \/>\nalthough the overall nucleotides only differed by ~4% between SARS-CoV-2 and RaTG13,<br \/>\nthe genomic average dS value, which is usually a neutral proxy, was 0.17 between these two<br \/>\nviruses (Table 1). Of note, the genome-wide dS value is 0.012 between humans and<br \/>\nchimpanzees [33], and 0.08 between humans and rhesus macaques [34]. Thus, the neutral<br \/>\nmolecular divergence between SARS-CoV-2 and RaTG13 is 14 times larger than that<br \/>\nbetween humans and chimpanzees, and twice as large as that between humans and macaques.<br \/>\nThe genomic average dS value between SARS-CoV-2 and GD Pangolin-CoV is 0.475, which<br \/>\nis comparable to that between humans and mice (0.5) [35], and the dS value between<br \/>\nSARS-CoV-2 and GX Pangolin-Cov is even larger (0.722). The scale of these measures<br \/>\nsuggests that we should perhaps consider the difference in the neutral evolving site rather than<br \/>\nthe difference in all nucleotide sequences when tracing the origin and natural intermediate<br \/>\nhost of SARS-CoV-2.<br \/>\nOur analyses of molecular evolution and population genetics suggested that some amino acid<br \/>\nchanges might be favored by natural selection during the evolution of SARS-CoV-2 and other<br \/>\nrelated viruses. However, negative selection appears to be the predominant force acting on<br \/>\nthese viruses. Interestingly, the virus isolated from one patient in Shenzhen on January 13,<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\n2020 (SZ_2020\/01\/13.a, GISAID ID: EPI_ISL_406592) had C at both positions 8,782 and<br \/>\n28,144 in the genome, belonging to neither L nor S type (Fig. 4A and 5). Notably, this strain<br \/>\nhad one stop-gain mutation in orf1ab and had accumulated 20 silent and 5 nonsynonymous<br \/>\nmutations after diverging from the ancestor haplotype (Fig. 4A). Thus, it is possible that<br \/>\nfunctional constraints on the genomic sequence were weakened after the disruption of orf1ab<br \/>\nin this strain. Notably, on viruses isolated from a patient living in South Korean<br \/>\n(Skorea_2020\/01.a, GISAID: EPI_ISL_411929), acquired six nonsynonymous mutations that<br \/>\nwere different from the most recent common ancestor of SARS-CoV-2: orf1ab (M902I and<br \/>\nT6891M), S (S221W), ORF3a (W128L and G251V), and E (L37H). If these changes are not<br \/>\ndue to sequencing errors, it would be interesting to test whether and how these mutations<br \/>\naffect the transmission and pathogenesis of SARS-CoV-2.<br \/>\nIn this work, we propose that SARS-CoV-2 can be divided into two major types (L and S<br \/>\ntypes): the S type is ancestral, and the L type evolved from S type. Intriguingly, the S and L<br \/>\ntypes can be clearly defined by just two tightly linked SNPs at positions 8,782 (orf1ab:<br \/>\nT8517C, synonymous) and 28,144 (ORF8: C251T, S84L). However, it is currently unclear<br \/>\nwhether L type evolved from the S type in humans or in the intermediate hosts. It is also<br \/>\nunclear whether the L type is more virulent than the S type. orf1ab, which encodes<br \/>\nreplicase\/transcriptase, is required for viral genome replication and might also be important<br \/>\nfor viral pathogenesis [36]. Although the T8517C mutation in orf1ab does not change the<br \/>\nprotein sequence (it changes the codon AGT (Ser) to AGC (Ser)), we hypothesized this<br \/>\nmutation might affect orf1ab translation since AGT is preferred while AGC is unpreferred<br \/>\n(Table S2). ORF8 promotes the expression of ATF6, the ER unfolded protein response factor,<br \/>\nin human cells [37]. Thus, it will be interesting to investigate the function of the S84L AA<br \/>\nchange in ORF8, as well as the combinatory effect of these two mutations in SARS-CoV-2<br \/>\npathogenesis.<br \/>\nIn summary, our analyses of 103 sequenced SARS-CoV-2 genomes suggest that the L type is<br \/>\nmore aggressive than the S type and that human interference may have shifted the relative<br \/>\nabundance of L and S type soon after the SARS-CoV-2 outbreak. As previously noted [19],<br \/>\nthe data examined in this study are still very limited, and follow-up analyses of a larger set of<br \/>\ndata are needed to have a better understanding of the evolution and epidemiology of<br \/>\nSARS-CoV-2. There is a strong need for further immediate, comprehensive studies that<br \/>\ncombine genomic data, epidemiological data, and chart records of the clinical symptoms of<br \/>\npatients with SARS-CoV-2.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nMATERIALS AND METHODS<br \/>\nMolecular evolution of SARS-CoV-2 and other related viruses<br \/>\nThe set of 103 complete genome sequences were downloaded from GISAID (Global Initiative<br \/>\non Sharing All Influenza Data; <a href=\"https:\/\/www.gisaid.org\/)\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.gisaid.org\/)<\/a> with acknowledgment, GenBank<br \/>\n(<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/genbank),\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.ncbi.nlm.nih.gov\/genbank),<\/a> and NMDC (<a href=\"http:\/\/nmdc.cn\/#\/nCoV).\" target=\"_blank\" rel=\"noopener noreferrer\">http:\/\/nmdc.cn\/#\/nCoV).<\/a> Sequences<br \/>\nand annotations of the reference genome of SARS-CoV-2 (NC_045512) and other related<br \/>\nviruses were downloaded from GenBank or GISAID (Table S1). The genomic sequences of<br \/>\nSARS-CoV-2 were aligned using MUSCLE v3.8.31 [38].<br \/>\nThe annotated CDSs of other viruses were downloaded from GenBank. To avoid missing<br \/>\nannotations in other viruses, we also annotated the ORFs using CDSs annotated in<br \/>\nSARS-CoV-2 using Exonerate (&#8211;model protein2genome:bestfit &#8211;score 5 -g y) [39]. The<br \/>\nprotein sequences of SARS-CoV-2 and other related viruses were aligned with MUSCLE<br \/>\nv3.8.31 [38], and the codon alignments were made based on the protein alignment with<br \/>\nRevTrans [40]. The codon alignments of the conserved ORFs were further concatenated for<br \/>\ndown-stream evolutionary analysis. The phylogenetic tree was constructed by the<br \/>\nneighbor-joining method in MEGA-X [41] using the parameters of Kimura 2-parameter<br \/>\nmodel, and only the third positions of codons were considered. YN00 from PAML v4.9a [20]<br \/>\nwas used to calculate the pairwise divergence between SARS-CoV-2 and other viruses for<br \/>\neach individual gene or for the concatenated sequences. The free-ratio model in CODEML in<br \/>\nthe PAML [20] package was used to calculate the dN, dS, and \u03c9 values for each branch.<br \/>\nPositively selected amino acids<br \/>\nPositive selection was detected using EasyCodeML [42], a recently published wrapper of<br \/>\nCODEML [20]. The M7 and M8 models were compared. In the M7 model, \u03c9 follows a beta<br \/>\ndistribution such that 0\u2a7d\u03c9\u2a7d1, and in the M8 model, a proportion p0 of sites have \u03c9 drawn<br \/>\nfrom the beta distribution, and the remaining sites with proportion p1 are positively selected<br \/>\nand have \u03c91&gt;1. The LRTs between M7 and M8 models were conducted by comparing twice<br \/>\nthe difference in log-likelihood values (2 ln \u0394l) against a \u03c7<br \/>\n2<br \/>\n-distribution (df=2). The positively<br \/>\nselected sites were identified with the Bayes Empirical Bayes (BEB) score larger than 0.95.<br \/>\nHaplotype network<br \/>\nDnaSP v6.12.03 [43] was used to generate multi-sequence aligned haplotype data, and<br \/>\nPopART v1.7 [44] was used to draw haplotype networks based on the haplotypes generated<br \/>\nby DnaSP. RAxML v8.2.12 [45] was used to build the maximum likelihood phylogenetic tree<br \/>\nof 103 aligned SARS-CoV-2 genomes with theparameters \u201c-p 1234 -m GTRCAT\u201d.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nSNP calling process<br \/>\nWe downloaded 12 SARS-CoV-2 metagenomic sequencing libraries (Table S2), and mapped<br \/>\nthe NGS reads to the reference genome of SARS-CoV-2 (NC_045512) using BWA<br \/>\n(0.7.17-r1188) [46] with the default parameters. SNP calling was done using bcftools mpileup<br \/>\n(bcftools 1.9) [47].<br \/>\nCodon usage bias analysis<br \/>\nWe calculated the RSCU (Relative Synonymous Codon Usage) value of each codon in the<br \/>\nSARS-CoV-2 reference genome (NC_045512). The RSCU value for each codon was the<br \/>\nobserved frequency of this codon divided by its expected frequency under equal usage among<br \/>\nthe amino acid [48]. The codons with RSCU &gt; 1 were defined as preferred codons, and those<br \/>\nwith RSCU &lt; 1 were defined as unpreferred codons. The FOP (frequency of optimal codons)<br \/>\nvalue of each gene was calculated as the number of preferred codons divided by the total<br \/>\nnumber of preferred and unpreferred codons.<br \/>\nConflict of interest<br \/>\nThe authors declare that they have no conflicts of interest.<br \/>\nAcknowledgments<br \/>\nThe authors thank the researchers who generated and shared the sequencing data from<br \/>\nGISAID (<a href=\"https:\/\/www.gisaid.org\/)\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/www.gisaid.org\/)<\/a> on which this research is based. We thank Dr. Chung-I Wu,<br \/>\nHong Wu, Hongya Gu, Liping Wei, Xuemei Lu, Weiwei Zhai, Guodong Wang, Xiaodong Su,<br \/>\nKeping Hu, and Leiliang Zhang for suggestive comments to this study. This work was<br \/>\nsupported by grants from the National Natural Science Foundation of China (No. 91731301)<br \/>\nto J.L. JC is supported by CAS Pioneer Hundred Talents Program.<br \/>\nReferences<br \/>\n1. Lu R, Zhao X, Li J, Niu P, Yang B, Wu H, et al. Genomic characterisation and epidemiology of 2019<br \/>\nnovel coronavirus: implications for virus origins and receptor binding. Lancet. 2020. Epub 2020\/02\/03.<br \/>\ndoi: 10.1016\/S0140-6736(20)30251-8. PubMed PMID: 32007145.<br \/>\n2. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, et al. A pneumonia outbreak associated with<br \/>\na new coronavirus of probable bat origin. Nature. 2020. doi: 10.1038\/s41586-020-2012-7. PubMed<br \/>\nPMID: 32015507.<br \/>\n3. Ren L-L, Wang Y-M, Wu Z-Q, Xiang Z-C, Guo L, Xu T, et al. Identification of a novel coronavirus<br \/>\ncausing severe pneumonia in human: a descriptive study. Chinese Medical Journal. 2020.<br \/>\n4. Cui J, Li F, Shi Z-L. Origin and evolution of pathogenic coronaviruses. Nature Reviews<br \/>\nMicrobiology. 2019;17(3):181-92. doi: 10.1038\/s41579-018-0118-9.<br \/>\n5. Li X, Song Y, Wong G, Cui J. Bat origin of a new human coronavirus: there and back again. Science<br \/>\nChina Life Sciences. 2020. doi: 10.1007\/s11427-020-1645-7.<br \/>\n6. Li W, Shi Z, Yu M, Ren W, Smith C, Epstein JH, et al. Bats are natural reservoirs of SARS-like<br \/>\ncoronaviruses. Science. 2005;310(5748):676-9. Epub 2005\/10\/01. doi: 10.1126\/science.1118391.<br \/>\nPubMed PMID: 16195424.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\n7. Dominguez SR, O&#8217;Shea TJ, Oko LM, Holmes KV. Detection of group 1 coronaviruses in bats in<br \/>\nNorth America. Emerg Infect Dis. 2007;13(9):1295-300. Epub 2008\/02\/07. doi:<br \/>\n10.3201\/eid1309.070491. PubMed PMID: 18252098; PubMed Central PMCID: PMCPMC2857301.<br \/>\n8. Wu A, Peng Y, Huang B, Ding X, Wang X, Niu P, et al. Genome Composition and Divergence of the<br \/>\nNovel Coronavirus (2019-nCoV) Originating in China. Cell Host Microbe. 2020. Epub 2020\/02\/09. doi:<br \/>\n10.1016\/j.chom.2020.02.001. PubMed PMID: 32035028.<br \/>\n9. Xu X, Chen P, Wang J, Feng J, Zhou H, Li X, et al. Evolution of the novel coronavirus from the<br \/>\nongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission. Sci China<br \/>\nLife Sci. 2020. Epub 2020\/02\/06. doi: 10.1007\/s11427-020-1637-5. PubMed PMID: 32009228.<br \/>\n10. Benvenuto D, Giovanetti M, Ciccozzi A, Spoto S, Angeletti S, Ciccozzi M. The 2019-new<br \/>\ncoronavirus epidemic: Evidence for virus evolution. J Med Virol. 2020. Epub 2020\/01\/30. doi:<br \/>\n10.1002\/jmv.25688. PubMed PMID: 31994738.<br \/>\n11. Zhou P, Yang X-L, Wang X-G, Hu B, Zhang L, Zhang W, et al. Discovery of a novel coronavirus<br \/>\nassociated with the recent pneumonia outbreak in humans and its potential bat origin. bioRxiv. 2020.<br \/>\n12. Chan JF, Kok KH, Zhu Z, Chu H, To KK, Yuan S, et al. Genomic characterization of the 2019 novel<br \/>\nhuman-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan.<br \/>\nEmerg Microbes Infect. 2020;9(1):221-36. Epub 2020\/01\/29. doi: 10.1080\/22221751.2020.1719902.<br \/>\nPubMed PMID: 31987001.<br \/>\n13. Wei X, Li X, Cui J. Evolutionary Perspectives on Novel Coronaviruses Identified in Pneumonia<br \/>\nCases in China. National Science Review. 2020.<br \/>\n14. Paraskevis D, Kostaki EG, Magiorkinis G, Panayiotakopoulos G, Sourvinos G, Tsiodras S.<br \/>\nFull-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of<br \/>\nemergence as a result of a recent recombination event. Infect Genet Evol. 2020;79:104212. Epub<br \/>\n2020\/02\/01. doi: 10.1016\/j.meegid.2020.104212. PubMed PMID: 32004758.<br \/>\n15. Gralinski LE, Menachery VD. Return of the Coronavirus: 2019-nCoV. Viruses. 2020;12(2). Epub<br \/>\n2020\/01\/30. doi: 10.3390\/v12020135. PubMed PMID: 31991541.<br \/>\n16. Wong MC, Cregeen SJJ, Ajami NJ, Petrosino JF. Evidence of recombination in coronaviruses<br \/>\nimplicating pangolin origins of nCoV-2019. bioRxiv. 2020.<br \/>\n17. Xiao K, Zhai J, Feng Y, Zhou N, Zhang X, Zou J-J, et al. Isolation and Characterization of<br \/>\n2019-nCoV-like Coronavirus from Malayan Pangolins. bioRxiv. 2020:2020.02.17.951335. doi:<br \/>\n10.1101\/2020.02.17.951335.<br \/>\n18. Lam TT-Y, Shum MH-H, Zhu H-C, Tong Y-G, Ni X-B, Liao Y-S, et al. Identification of 2019-nCoV<br \/>\nrelated coronaviruses in Malayan pangolins in southern China. bioRxiv. 2020:2020.02.13.945485. doi:<br \/>\n10.1101\/2020.02.13.945485.<br \/>\n19. Wu C-I, Poo M-m. Moral imperative for the immediate release of 2019-nCoV sequence data.<br \/>\nNational Science Review. 2020. doi: 10.1093\/nsr\/nwaa030.<br \/>\n20. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007;24(8):1586-91.<br \/>\nEpub 2007\/05\/08. doi: 10.1093\/molbev\/msm088. PubMed PMID: 17483113.<br \/>\n21. Hanson G, Coller J. Codon optimality, bias and usage in translation and mRNA decay. Nature<br \/>\nreviews Molecular cell biology. 2018;19(1):20-30. Epub 2017\/10\/11. doi: 10.1038\/nrm.2017.91.<br \/>\nPubMed PMID: 29018283.<br \/>\n22. Wan Y, Shang J, Graham R, Baric RS, Li F. Receptor recognition by novel coronavirus from Wuhan:<br \/>\nAn analysis based on decade-long structural studies of SARS. J Virol. 2020. Epub 2020\/01\/31. doi:<br \/>\n10.1128\/JVI.00127-20. PubMed PMID: 31996437.<br \/>\n23. Wrapp D, Wang N, Corbett KS, Goldsmith JA, Hsieh C-L, Abiona O, et al. Cryo-EM Structure of the<br \/>\n2019-nCoV Spike in the Prefusion Conformation. bioRxiv. 2020:2020.02.11.944462. doi:<br \/>\n10.1101\/2020.02.11.944462.<br \/>\n24. Ou X, Liu Y, Lei X, Li P, Mi D, Ren L, et al. Characterization of spike glycoprotein of 2019-nCoV on<br \/>\nvirus entry and its immune cross-reactivity with spike glycoprotein of SARS-CoV.<br \/>\n2020:10.21203\/rs.2.4016\/v1. doi: 10.21203\/rs.2.24016\/v1.<br \/>\n25. Qu X-X, Hao P, Song X-J, Jiang S-M, Liu Y-X, Wang P-G, et al. Identification of Two Critical Amino<br \/>\nAcid Residues of the Severe Acute Respiratory Syndrome Coronavirus Spike Protein for Its Variation in<br \/>\nZoonotic Tropism Transition via a Double Substitution Strategy. Journal of Biological Chemistry.<br \/>\n2005;280(33):29588-95.<br \/>\n26. Ren W, Qu X, Li W, Han Z, Yu M, Zhou P, et al. Difference in Receptor Usage between Severe<br \/>\nAcute Respiratory Syndrome (SARS) Coronavirus and SARS-Like Coronavirus of Bat Origin. Journal of<br \/>\nVirology. 2008;82(4):1899. doi: 10.1128\/JVI.01085-07.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\n27. Wu F, Zhao S, Yu B, Chen YM, Wang W, Song ZG, et al. A new coronavirus associated with human<br \/>\nrespiratory disease in China. Nature. 2020. Epub 2020\/02\/06. doi: 10.1038\/s41586-020-2008-3.<br \/>\nPubMed PMID: 32015508.<br \/>\n28. Ji W, Wang W, Zhao X, Zai J, Li X. Homologous recombination within the spike glycoprotein of the<br \/>\nnewly identified coronavirus may boost cross\u2010species transmission from snake to human. Journal of<br \/>\nmedical virology. 2020.<br \/>\n29. Zhao Z, Li H, Wu X, Zhong Y, Zhang K, Zhang Y-P, et al. Moderate mutation rate in the SARS<br \/>\ncoronavirus genome and its implications. BMC Evolutionary Biology. 2004;4(1):21. doi:<br \/>\n10.1186\/1471-2148-4-21.<br \/>\n30. Zhang C, Wang M. Origin time and epidemic dynamics of the 2019 novel coronavirus. bioRxiv.<br \/>\n2020.<br \/>\n31. Yu W-B, Tang G-D, Zhang L, Corlett RT. Decoding evolution and transmissions of novel<br \/>\npneumonia coronavirus using the whole genomic data. ChinaXiv. 2020:202002.00033. doi:<br \/>\n10.12074\/202002.00033.<br \/>\n32. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype<br \/>\nmaps. Bioinformatics. 2005;21(2):263-5. Epub 2004\/08\/07. doi: 10.1093\/bioinformatics\/bth457.<br \/>\nPubMed PMID: 15297300.<br \/>\n33. Waterson RH, Lander ES, Wilson RK, The Chimpanzee S, Analysis C. Initial sequence of the<br \/>\nchimpanzee genome and comparison with the human genome. Nature. 2005;437(7055):69-87. doi:<br \/>\n10.1038\/nature04072.<br \/>\n34. Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, et al. Evolutionary and<br \/>\nBiomedical Insights from the Rhesus Macaque Genome. Science. 2007;316(5822):222. doi:<br \/>\n10.1126\/science.1139247.<br \/>\n35. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and<br \/>\ncomparative analysis of the mouse genome. Nature. 2002;420(6915):520-62. Epub 2002\/12\/06. doi:<br \/>\n10.1038\/nature01262. PubMed PMID: 12466850.<br \/>\n36. Graham RL, Sparks JS, Eckerle LD, Sims AC, Denison MR. SARS coronavirus replicase proteins in<br \/>\npathogenesis. Virus Res. 2008;133(1):88-100. Epub 2007\/04\/03. doi: 10.1016\/j.virusres.2007.02.017.<br \/>\nPubMed PMID: 17397959; PubMed Central PMCID: PMCPMC2637536.<br \/>\n37. Hu B, Zeng L-P, Yang X-L, Ge X-Y, Zhang W, Li B, et al. Discovery of a rich gene pool of bat<br \/>\nSARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLOS<br \/>\nPathogens. 2017;13(11):e1006698. doi: 10.1371\/journal.ppat.1006698.<br \/>\n38. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput.<br \/>\nNucleic Acids Res. 2004;32(5):1792-7. Epub 2004\/03\/23. doi: 10.1093\/nar\/gkh340. PubMed PMID:<br \/>\n15034147; PubMed Central PMCID: PMCPMC390337.<br \/>\n39. Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison.<br \/>\nBMC Bioinformatics. 2005;6:31. doi: 10.1186\/1471-2105-6-31. PubMed PMID: 15713233; PubMed<br \/>\nCentral PMCID: PMCPMC553969.<br \/>\n40. Wernersson R, Pedersen AG. RevTrans: Multiple alignment of coding DNA from aligned amino<br \/>\nacid sequences. Nucleic Acids Res. 2003;31(13):3537-9. Epub 2003\/06\/26. PubMed PMID: 12824361;<br \/>\nPubMed Central PMCID: PMCPMC169015.<br \/>\n41. Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: Molecular Evolutionary Genetics Analysis<br \/>\nacross Computing Platforms. Mol Biol Evol. 2018;35(6):1547-9. Epub 2018\/05\/04. doi:<br \/>\n10.1093\/molbev\/msy096. PubMed PMID: 29722887; PubMed Central PMCID: PMCPMC5967553.<br \/>\n42. Gao F, Chen C, Arab DA, Du Z, He Y, Ho SYW. EasyCodeML: A visual tool for analysis of selection<br \/>\nusing CodeML. Ecol Evol. 2019;9(7):3891-8. Epub 2019\/04\/25. doi: 10.1002\/ece3.5015. PubMed PMID:<br \/>\n31015974; PubMed Central PMCID: PMCPMC6467853.<br \/>\n43. Rozas J, Ferrer-Mata A, Sanchez-DelBarrio JC, Guirao-Rico S, Librado P, Ramos-Onsins SE, et al.<br \/>\nDnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets. Mol Biol Evol.<br \/>\n2017;34(12):3299-302. doi: 10.1093\/molbev\/msx248. PubMed PMID: 29029172.<br \/>\n44. Leigh JW, Bryant D. popart: full-feature software for haplotype network construction. Methods<br \/>\nin Ecology and Evolution. 2015;6(9):1110-6. doi: 10.1111\/2041-210x.12410.<br \/>\n45. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large<br \/>\nphylogenies. Bioinformatics. 2014;30(9):1312-3. Epub 2014\/01\/24. doi:<br \/>\n10.1093\/bioinformatics\/btu033. PubMed PMID: 24451623; PubMed Central PMCID:<br \/>\nPMCPMC3998144.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\n46. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform.<br \/>\nBioinformatics. 2009;25(14):1754-60. Epub 2009\/05\/20. doi: 10.1093\/bioinformatics\/btp324.<br \/>\nPubMed PMID: 19451168; PubMed Central PMCID: PMCPMC2705234.<br \/>\n47. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment\/Map<br \/>\nformat and SAMtools. Bioinformatics. 2009;25(16):2078-9. Epub 2009\/06\/10. doi:<br \/>\n10.1093\/bioinformatics\/btp352. PubMed PMID: 19505943; PubMed Central PMCID:<br \/>\nPMCPMC2723002.<br \/>\n48. Sharp PM, Li WH. Codon usage in regulatory genes in Escherichia coli does not reflect selection<br \/>\nfor &#8216;rare&#8217; codons. Nucleic Acids Res. 1986;14(19):7737-49. Epub 1986\/10\/10. doi:<br \/>\n10.1093\/nar\/14.19.7737. PubMed PMID: 3534792; PubMed Central PMCID: PMCPMC311793.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nFigure 1. Molecular divergence and selective pressures during the evolution of<br \/>\nSARS-CoV-2 and related viruses.<br \/>\nA. The phylogenetic tree of SARS-CoV-2 and the related Coronaviruses. The branch length<br \/>\n(dS) is presented, and the dN\/dS (\u03c9) value is given in the parenthesis. The phylogenetic tree<br \/>\nwas reconstructed with the synonymous sites in the concatenated CDSs of nine conserved<br \/>\nORFs (orf1ab, E, M, N, S, ORF3a, ORF6, ORF7a and ORF7b).<br \/>\nB. Conservation of 6 critical amino acid residues in the spike (S) protein. The critical active<br \/>\nsites are Y442, L472, N479, D480, T487, and Y491 in SARS-CoV, and they correspond to<br \/>\nL455, F486, Q493, S494, N501, and Y505 in SARS-CoV-2 (marked with inverted triangles),<br \/>\nrespectively.<br \/>\nC. Three candidate positively selected sites (marked with inverted triangles) in the<br \/>\nreceptor-binding domain (RBD) of spike protein (S:439N, S:483V and S:493Q) and the<br \/>\nsurrounding 10 amino acids.<br \/>\nB<br \/>\nGX Pangolin-CoV_P2V<br \/>\nGX Pangolin-CoV_P5E<br \/>\nGX Pangolin-CoV_P1E<br \/>\nGX Pangolin-CoV_P5L<br \/>\nGX Pangolin-CoV_P4L<br \/>\nGX Pangolin-CoV_P3B<br \/>\nGD Pangolin-CoV<br \/>\nBat SARSr-CoV ZXC21<br \/>\nBat SARSr-CoV ZC45<br \/>\n0.058(0.049)<br \/>\n0.051(0.090)<br \/>\n0.083(0.031)<br \/>\nBat RaTG13<br \/>\nSARS-CoV-2<br \/>\n0.094(0.036)<br \/>\nSARS-CoV<br \/>\nBat SARSr-CoV BM48-31<br \/>\n0.412(0.076)<br \/>\n0.644(0.045)<br \/>\n0.313(0.034)<br \/>\n0.428(0.098)<br \/>\n0.183(0.061)<br \/>\n0.158(0.034)<br \/>\n0.091(0.041)<br \/>\n0.196(0.053)<br \/>\nC<br \/>\nA<br \/>\nSARS-CoV-2<br \/>\nBat RaTG13<br \/>\nGD Pangolin-CoV<br \/>\nGX Pangolin-CoV_P2V<br \/>\nGX Pangolin-CoV_P5E<br \/>\nGX Pangolin-CoV_P1E<br \/>\nGX Pangolin-CoV_P5L<br \/>\nGX Pangolin-CoV_P4L<br \/>\nGX Pangolin-CoV_P3B<br \/>\nBat SARSr-CoV ZXC21<br \/>\nBat SARSr-CoV ZC45<br \/>\nSARS-CoV<br \/>\nBat SARSr-BM48-31<br \/>\nS:455L S:486F 493Q494S 501N 505Y<br \/>\nSARS-CoV-2<br \/>\nBat RaTG13<br \/>\nGD Pangolin-CoV<br \/>\nGX Pangolin-CoV_P5L<br \/>\nBat SARSr-CoV ZXC21<br \/>\nBat SARSr-CoV ZC45<br \/>\nSARS-CoV<br \/>\nBat SARSr-BM48-31<br \/>\nS:439N, RBD S:483V, RBD S:493Q, RBD, active site Downloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nFigure 2. The frequency spectra of derived mutations in 103 SARS-CoV-2 viruses. Note<br \/>\nthe derived alleles of synonymous mutations are skewed towards higher frequencies than<br \/>\nthose of nonsynonymous mutations.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nFigure 3. Linkage disequilibrium between SNPs in the SARS-CoV-2 viruses.<br \/>\nA. LD plot of any two SNP pairs among the 29 sites that have minor alleles in at least two<br \/>\nstrains. The number near slashes at the top of the image shows the coordinate of sites in the<br \/>\ngenome. Color in the square is given by standard (D&#8217;\/LOD), and the number in square is r<br \/>\n2<br \/>\nvalue.<br \/>\nB. The r<br \/>\n2<br \/>\nof each pair of SNPs (y-axis) against the genomic distance between that pair<br \/>\n(x-axis).<br \/>\nC. The LOD of each pair of SNPs (y-axis) against the genomic distance between that pair<br \/>\n(x-axis).<br \/>\nNote that in both B and C, the red point represents the LD between SNPs at 8,782 and 28,144.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nFigure 4. Haplotype analysis of SARS-CoV-2 viruses.<br \/>\nA. The haplotype networks of SARS-CoV-2 viruses. Blue represents the L type, and red is the<br \/>\nS type. The orange arrow indicates that the L type evolved from the S type. Note that in this<br \/>\nstudy, we marked each sample with a unique ID that starting with the geological location,<br \/>\nfollowed by the date the virus was isolated (see Table S1 for details). Each ID did not contain<br \/>\ninformation of the patient&#8217;s race or ethnicity. ZJ, Zhejiang; YN, Yunnan; WH, Wuhan; USA,<br \/>\nUnited States of America; TW, Taiwan; SZ, Shenzhen; SD, Shandong; SC, Sichuan; JX,<br \/>\nJiangxi; JS, Jiangsu; HZ, Hangzhou; GZ, Guangzhou; GD, Guangdong; FS, Foshan; CQ,<br \/>\nChongqing.<br \/>\nB. Evolution of the L and S types of SARS-CoV-2 viruses. Genome sequence alignments<br \/>\nwith the seven most closely related viruses indicated that the S type was most likely the<br \/>\nancient version of SARS-CoV-2. \u201c.\u201d, The nucleotide sequence is identical; \u201c-\u201d, gap.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nFigure 5. The unrooted phylogenetic tree of the 103 SARS-CoV-2 genomes. The ID of<br \/>\neach sample is the same as in Fig. 4A. Note WH_2019\/12\/31.a represents the reference<br \/>\ngenome (NC_045512). Note SZ_2020\/01\/13.a had C at both positions 8,782 and 28,144 in<br \/>\nthe genome, belonging to neither L nor S type.<br \/>\n5.0E-5<br \/>\nJapan_2020\/01\/25.a<br \/>\nAustralia_2020\/01\/30.a<br \/>\nFrance_2020\/01\/29.a<br \/>\nWH_2020\/01\/01.f<br \/>\nFS_2020\/01\/22.b<br \/>\nNepal_2020\/01\/13.a<br \/>\nUSA_2020\/01\/25.a<br \/>\nUSA_2020\/01\/22.b<br \/>\nGD_2020\/01\/14.a<br \/>\nFrance_2020\/01\/23.b<br \/>\nSC_2020\/01\/15.a<br \/>\nWH_2019\/12\/30.e<br \/>\nUSA_2020\/01\/29.b<br \/>\nGD_2020\/01\/15.c<br \/>\nWH_2020\/01\/01.a<br \/>\nTW_2020\/01\/31.a<br \/>\nAustralia_2020\/01\/24.a<br \/>\nWH_2020\/01\/02\/.b<br \/>\nFS_2020\/01\/22.a<br \/>\nWH_2019\/12\/26.a<br \/>\nEngland_2020\/01\/29.a<br \/>\nWH_2020\/01\/01.e<br \/>\nGZ_2020\/01\/22.a<br \/>\nGermany_2020\/01\/28.a<br \/>\nGD_2020\/01\/15.b<br \/>\nFS_2020\/01\/22.c<br \/>\nSZ_2020\/01\/13.a<br \/>\nUSA_2020\/01\/28.a<br \/>\nWH_2020\/01\/05.a<br \/>\nHZ_2020\/01\/19.a<br \/>\nJapan_2020\/01\/31.b<br \/>\nThailand_2020\/01\/13.a<br \/>\nSZ_2020\/01\/13.b<br \/>\nGD_2020\/01\/22.a<br \/>\nEngland_2020\/01\/29.b<br \/>\nThailand_2020\/01\/08.a<br \/>\nWH_2019\/12\/30.n<br \/>\nWH_2020\/01\/01.b<br \/>\nWH_2019\/12\/30.d<br \/>\nUSA_2020\/01\/27.a<br \/>\nUSA_2020\/01\/19.a<br \/>\nUSA_2020\/01\/21.a<br \/>\nSD_2020\/01\/19.a<br \/>\nJX_2020\/01\/11.a<br \/>\nSydney_2020\/01\/25.a<br \/>\nGD_2020\/01\/23.a<br \/>\nVietnam_2020\/01\/24.a<br \/>\nJapan_2020\/01\/29.a<br \/>\nJapan_2020\/01\/31.a<br \/>\nBelgium_2020\/02\/03.a<br \/>\nUSA_2020\/01\/22.a<br \/>\nHZ_2020\/01\/20.a<br \/>\nWH_2019\/12\/31.a<br \/>\nWH_2020\/01\/07.a<br \/>\nSkorea_2020\/01.a<br \/>\nFrance_2020\/01\/23.a<br \/>\nWH_2019\/12\/30.l<br \/>\nUSA_2020\/01\/25.b<br \/>\nWH_2019\/12\/30.c<br \/>\nCQ_2020\/01\/21.a<br \/>\nWH_2019\/12\/30.m<br \/>\nWH_2020\/01\/01.c<br \/>\nSingapore_2020\/01\/23.a<br \/>\nWH_2019\/12\/30.b<br \/>\nCQ_2020\/01\/18.a<br \/>\nUSA_2020\/01\/23.a<br \/>\nSZ_2020\/01\/16.a<br \/>\nAustralia_2020\/01\/28.a<br \/>\nSZ_2020\/01\/10.a<br \/>\nYN_2020\/01\/17.a<br \/>\nZJ_2020\/01\/17.a<br \/>\nSZ_2020\/01\/16.b<br \/>\nGD_2020\/01\/18.a WH_2019\/12\/30.i<br \/>\nWH_2019\/12\/24.a<br \/>\nYN_2020\/01\/17.b<br \/>\nJapan_2020\/01\/29.b<br \/>\nWH_2019\/12\/30.a<br \/>\nSingapore_2020\/01\/25.a<br \/>\nWH_2019\/12\/30.h<br \/>\nUSA_2020\/01\/29.d<br \/>\nSZ_2020\/01\/11.a<br \/>\nKorea_2020\/01\/25.a<br \/>\nWH_2020\/01\/01.d<br \/>\nWH_2019\/12\/30.g<br \/>\nWH_2020\/01\/02\/.a<br \/>\nZJ_2020\/01\/16.a<br \/>\nTW_2020\/01\/23.a<br \/>\nFrance_2020\/01\/29.b Singapore_2020\/02\/01.a<br \/>\nTW_2020\/02\/05.a<br \/>\nCQ_2020\/01\/23.a<br \/>\nWH_2019\/12\/30.j<br \/>\nWH_2019\/12\/30.k<br \/>\nUSA_2020\/01\/29.c<br \/>\nSydney_2020\/01\/22.a<br \/>\nAustralia_2020\/01\/25.a<br \/>\nJS_2020\/01\/19.a<br \/>\nGD_2020\/01\/17.a<br \/>\nUSA_2020\/01\/29.a<br \/>\nGD_2020\/01\/15.a<br \/>\nUSA_2020\/01\/31.a<br \/>\nWH_2019\/12\/30.f<br \/>\nS Type<br \/>\nL Type Downloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nFigure 6. The two types of SARS-CoV-2 showed differences in temporal and spatial<br \/>\ndistributions.<br \/>\nFigure 7. The heteroplasmy of SARS-CoV-2 viruses in human patients.<br \/>\nA. The viruses isolated from a patient that lived in the United States (USA_2020\/01\/21.a,<br \/>\nGISAID ID: EPI_ISL_404253) had the genotype Y (C or T) at both 8,782 and 28,144. The<br \/>\nmost likely explanation is that this patient was infected by both the L and S types. Note the<br \/>\nreference is L type.<br \/>\nB. The viruses Australia_2020\/01\/28.a (GISAID ID:EPI_ISL_407894) identified from a<br \/>\npatient in Australia had multiple degenerated nucleotides, and the best explanation is that this<br \/>\npatient was infected by at least two different strains of SARS-CoV-2 viruses.<br \/>\n0<br \/>\n20<br \/>\n40<br \/>\n60<br \/>\nWuhan Outside<br \/>\nWuhan<br \/>\nBefore<br \/>\nJan.7 2020<br \/>\nFrom<br \/>\nJan.7 2020<br \/>\nNumber of strains<br \/>\nS Type<br \/>\nL Type<br \/>\n96.3% 96.2%<br \/>\n61.6% 62.2%<br \/>\n38.4% 37.8%<br \/>\n3.7% 3.8%<br \/>\nP = 0.0004 P = 0.0008 Downloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nTable 1 The molecular divergence between SARS-CoV-2 and related viruses<br \/>\nGene<br \/>\nAligned<br \/>\nLength<br \/>\n(nt)<br \/>\nRaTG13<br \/>\nGD<br \/>\nPangolin-Co<br \/>\nV<br \/>\nGX<br \/>\nPangolin-C<br \/>\noV<br \/>\nSARSr-CoV<br \/>\nZC45 SARSCoV<br \/>\nSARSr-CoV<br \/>\nBM48-31<br \/>\nGenomic<br \/>\nAverage<br \/>\n28734 0.008\/0.17<br \/>\n(0.044)<br \/>\n0.026\/0.475<br \/>\n(0.054)<br \/>\n0.055\/0.722<br \/>\n(0.076)<br \/>\n0.044\/0.549<br \/>\n(0.081)<br \/>\n0.113\/0.926<br \/>\n(0.122)<br \/>\n0.143\/1.15<br \/>\n(0.124)<br \/>\nORF10 114 0.011\/0<br \/>\n(NA)<br \/>\n0.011\/0<br \/>\n(NA)<br \/>\n0.072\/0.044<br \/>\n(1.637)<br \/>\n0.011\/0<br \/>\n(NA)<br \/>\n&#8211; &#8211;<br \/>\nORF3a 825 0.009\/0.157<br \/>\n(0.06)<br \/>\n0.019\/0.291<br \/>\n(0.065)<br \/>\n0.066\/0.518<br \/>\n(0.128)<br \/>\n0.052\/0.508<br \/>\n(0.102)<br \/>\n0.188\/0.918<br \/>\n(0.205)<br \/>\n0.271\/0.923<br \/>\n(0.294)<br \/>\nORF6 183 0\/0.098<br \/>\n(0)<br \/>\n0.014\/0.217<br \/>\n(0.062)<br \/>\n0.038\/0.491<br \/>\n(0.077)<br \/>\n0.027\/0.173<br \/>\n(0.158)<br \/>\n0.191\/0.913<br \/>\n(0.209)<br \/>\n0.393\/1.512<br \/>\n(0.26)<br \/>\nORF7a 363 0.011\/0.177<br \/>\n(0.061)<br \/>\n0.018\/0.275<br \/>\n(0.066)<br \/>\n0.073\/0.477<br \/>\n(0.153)<br \/>\n0.066\/0.351<br \/>\n(0.188)<br \/>\n0.088\/0.697<br \/>\n(0.126)<br \/>\n0.337\/1.14<br \/>\n(0.296)<br \/>\nORF7b 129 0.01\/0<br \/>\n(NA)<br \/>\n0.02\/0.455<br \/>\n(0.043)<br \/>\n0.17\/0.436<br \/>\n(0.39)<br \/>\n0.029\/0.181<br \/>\n(0.162)<br \/>\n0.155\/0.401<br \/>\n(0.387)<br \/>\n0.264\/NA<br \/>\n(NA)<br \/>\nORF8 363 0.021\/0.07<br \/>\n(0.303)<br \/>\n0.032\/0.303<br \/>\n(0.105)<br \/>\n0.099\/1.015<br \/>\n(0.098)<br \/>\n0.03\/0.603<br \/>\n(0.05)<br \/>\n&#8211; &#8211;<br \/>\nE 225 0\/0.018<br \/>\n(0)<br \/>\n0\/0.037<br \/>\n(0)<br \/>\n0.006\/0.096<br \/>\n(0.063)<br \/>\n0\/0.056<br \/>\n(0)<br \/>\n0.027\/0.166<br \/>\n(0.164)<br \/>\n0.043\/0.352<br \/>\n(0.121)<br \/>\nM 666 0.004\/0.186<br \/>\n(0.021)<br \/>\n0.014\/0.298<br \/>\n(0.046)<br \/>\n0.025\/0.372<br \/>\n(0.067)<br \/>\n0.016\/0.283<br \/>\n(0.055)<br \/>\n0.07\/0.576<br \/>\n(0.121)<br \/>\n0.109\/1.292<br \/>\n(0.085)<br \/>\nN 1257 0.005\/0.131<br \/>\n(0.039)<br \/>\n0.011\/0.144<br \/>\n(0.076)<br \/>\n0.04\/0.304<br \/>\n(0.132)<br \/>\n0.036\/0.333<br \/>\n(0.108)<br \/>\n0.059\/0.381<br \/>\n(0.155)<br \/>\n0.102\/1.197<br \/>\n(0.085)<br \/>\norf1a 13215 0.009\/0.167<br \/>\n(0.054)<br \/>\n0.026\/0.488<br \/>\n(0.053)<br \/>\n0.073\/0.811<br \/>\n(0.09)<br \/>\n0.026\/0.405<br \/>\n(0.063)<br \/>\n0.148\/1.141<br \/>\n(0.129)<br \/>\n0.174\/1.199<br \/>\n(0.145)<br \/>\norf1ab 21288 0.007\/0.152<br \/>\n(0.044)<br \/>\n0.019\/0.495<br \/>\n(0.039)<br \/>\n0.055\/0.776<br \/>\n(0.071)<br \/>\n0.031\/0.527<br \/>\n(0.058)<br \/>\n0.105\/0.962<br \/>\n(0.109)<br \/>\n0.125\/1.108<br \/>\n(0.113)<br \/>\nS (spike) 3819 0.014\/0.321<br \/>\n(0.043)<br \/>\n0.075\/0.69<br \/>\n(0.108)<br \/>\n0.06\/0.86<br \/>\n(0.07)<br \/>\n0.138\/1.063<br \/>\n(0.13)<br \/>\n0.172\/1.265<br \/>\n(0.136)<br \/>\n0.217\/1.518<br \/>\n(0.143)<br \/>\nFor each gene, the dN and dS values between SARS-CoV-2 and another virus are given, and<br \/>\nthe dN\/dS (\u03c9) ratio is given in the parenthesis.<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<br \/>\nTable<br \/>\n2. The heteroplasmy of SARS<br \/>\n-CoV<br \/>\n-2 viruses in human patients<br \/>\nAccession<br \/>\nnumber<br \/>\nGenomic position<br \/>\nRef<br \/>\nallele<br \/>\nAlt<br \/>\nallele<br \/>\nRef<br \/>\nreads<br \/>\nAlt<br \/>\nreads<br \/>\nLocation_date GISAID ID<br \/>\nSRR10903401 1821<br \/>\nG<br \/>\nA 52<br \/>\n5 WH_2020\/01\/02.a EPI_ISL_406716<br \/>\nSRR10903401 19164<br \/>\nC<br \/>\nT 40 12 WH_2020\/01\/02.a EPI_ISL_406716<br \/>\nSRR10903401 24323<br \/>\nA<br \/>\nC 102 67 WH_2020\/01\/02.a EPI_ISL_406716<br \/>\nSRR10903401 26314<br \/>\nG<br \/>\nA 15<br \/>\n2 WH_2020\/01\/02.a EPI_ISL_406716<br \/>\nSRR10903401 26590<br \/>\nT<br \/>\nC 10<br \/>\n2 WH_2020\/01\/02.a EPI_ISL_406716<br \/>\nSRR10903402 11563<br \/>\nC<br \/>\nT 164 26 WH_2020<br \/>\n\/01<br \/>\n\/02.b EPI_ISL_406717<br \/>\nSRR11092057 9064 TTAT TT 13<br \/>\n2 WH_2019<br \/>\n\/12<br \/>\n\/30.e EPI_ISL_402124<br \/>\nSRR11092057 17825<br \/>\nC<br \/>\nT 19<br \/>\n5 WH_2019<br \/>\n\/12<br \/>\n\/30.e EPI_ISL_402124<br \/>\nSRR11092059 4795<br \/>\nC<br \/>\nT 10<br \/>\n4 WH_2019\/12\/30.h EPI_ISL_402130<br \/>\nSRR11092059 6360<br \/>\nA<br \/>\nG 39<br \/>\n5 WH_2019\/12\/30.h EPI_ISL_402130<br \/>\nSRR11092059 7042<br \/>\nG<br \/>\nA<br \/>\n5<br \/>\n3 WH_2019\/12\/30.h EPI_ISL_402130<br \/>\nSRR11092059 12153<br \/>\nC<br \/>\nT 15 13 WH_2019\/12\/30.h EPI_ISL_402130<br \/>\nSRR11092059 15921<br \/>\nG<br \/>\nT 19<br \/>\n2 WH_2019\/12\/30.h EPI_ISL_402130<br \/>\nSRR11092059 16474<br \/>\nA<br \/>\nG 11<br \/>\n2 WH_2019\/12\/30.h EPI_ISL_402130<br \/>\nSRR11092059 20344<br \/>\nC<br \/>\nT 19<br \/>\n2 WH_2019\/12\/30.h EPI_ISL_402130<br \/>\nSRR11092062 565<br \/>\nT<br \/>\nC 64 23 WH_2019<br \/>\n\/12<br \/>\n\/30.e EPI_ISL_402124<br \/>\nSRR11092062 17825<br \/>\nC<br \/>\nT 141 34 WH_2019<br \/>\n\/12<br \/>\n\/30.e EPI_ISL_402124<br \/>\nSRR11092063 29441<br \/>\nC<br \/>\nA<br \/>\n6<br \/>\n2 WH_2019<br \/>\n\/12<br \/>\n\/30.d EPI_ISL_402127<br \/>\nDownloaded from <a href=\"https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/academic.oup.com\/nsr\/advance-article-abstract\/doi\/10.1093\/nsr\/nwaa036\/5775463<\/a> by guest on 06 March 2020<\/p>\n","protected":false},"excerpt":{"rendered":"<p>RESEARCH ARTICLE MICROBIOLOGY On the origin and continuing evolution of SARS-CoV-2 Xiaolu Tang1,7 , Changcheng &hellip; <a href=\"https:\/\/www.bicyclogy.com\/?p=4077\" class=\"more-link\">\u7d9a\u304d\u3092\u8aad\u3080 <span class=\"screen-reader-text\">2020\u5e743\u67086\u65e5<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false},"version":2}},"categories":[68,17],"tags":[],"class_list":["post-4077","post","type-post","status-publish","format-standard","hentry","category-group_2","category-covid"],"jetpack_publicize_connections":[],"acf":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=\/wp\/v2\/posts\/4077","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4077"}],"version-history":[{"count":2,"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=\/wp\/v2\/posts\/4077\/revisions"}],"predecessor-version":[{"id":4981,"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=\/wp\/v2\/posts\/4077\/revisions\/4981"}],"wp:attachment":[{"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4077"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4077"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.bicyclogy.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4077"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}