Chromosome-level genomes of seeded and seedless date plum based on third-generation DNA sequencing and Hi-C analysis
doi: 10.48130/FR-2021-0009
-
Abstract: Diospyros lotus L. (Date plum) is an important tree species that produces fruit with a high nutritional value. An accurate chromosomal assembly of a species facilitates research on chromosomal evolution and functional gene mapping. In this study, we assembled the first chromosome-level genomes of seeded and seedless D. lotus using Illumina short reads, PacBio long reads, and Hi-C technology. The assembled genomes comprising 15 chromosomes were 617.66 and 647.31 Mb in size, with a scaffold N50 of 40.72 and 42.67 Mb for the seedless and seeded D. lotus, respectively. A BUSCO analysis revealed that the seedless and seeded D. lotus genomes were 91.53% and 91.60% complete, respectively. Additionally, 20,689 (95.4%) and 22,844 (98.5%) protein-coding genes in the seedless and seeded D. lotus genomes were annotated, respectively. Comparisons of the chromosomes between genomes revealed inversions and translocations on chromosome 8 and inversions on chromosome 11. We identified 490 and 424 gene families that expanded in the seedless and seeded D. lotus, respectively. The enriched pathways among these gene families included the estrogen signaling pathway, the MAPK signaling pathway, and biosynthetic pathways for flavonoids, monoterpenoids, and glucosinolates. Moreover, we constructed the first Diospyros genome database (http://www.persimmongenome.cn). On the basis of our data, we developed the first high-quality annotated D. lotus reference genomes, which will be useful for genomic studies on persimmon and for clarifying the molecular mechanisms underlying important traits. Comparisons between the seeded and seedless D. lotus genomes may also elucidate the molecular basis of seedlessness.
-
Key words:
- Diospyros lotus /
- genome assembly /
- seedlessness
-
Figure 6. Phylogenetic tree of seedless and seeded Diospyros lotus and 17 other species constructed using the maximum-likelihood method. Estimated species divergence times (million years ago) and 95% confidence intervals are labeled at each branch site. Blue numbers on branches indicate the estimated divergence times. Red dots indicate the divergence times estimated based on fossil evidence.
Table 1. Summary of the sequencing data used for assembling the Diospyros lotus genomes.
Library type Seedless Diospyros lotus (W01) Seeded Diospyros lotus (Yz01) Library size (bp) Clean data (Gb) Coverage (×) Library size (bp) Clean data (Gb) Coverage (×) Illumina 350 80.99 119.53 350 79.21 114.98 Pacbio 20,000 92.1 103.29 20,000 133.51 166.98 Hi-C 350 86.96 − 350 107.87 − Table 2. Summary of the assembled seedless and seeded Diospyros lotus genomes.
Parameter Seedless Diospyros lotus (W01) Seeded Diospyros lotus (Yz01) Contig length (bp) Contig number Contig length (bp) Contig number N90 561,232 228 537,928 279 N80 1,144,354 151 1,078,450 194 N70 1,625,012 106 1,450,541 143 N60 2,258,638 73 2,059,392 106 N50 3,006,748 49 2,463,960 77 Total length 617,662,490 − 647,313,630 − Number (≥ 100 bp) − 706 − 743 Number (≥ 2 kb) − 691 − 734 Max length 16,262,241 − 14,842,567 − Table 3. General statistics for the functional annotations of the genes in the seedless and seeded Diospyros lotus genomes.
Type Seedless Diospyros lotus (W01) Seeded Diospyros lotus (Yz01) Number Percent (%) Number Percent (%) Total 21,684 − 23,193 − Annotated 20,689 95.41 22,844 98.5 InterPro 17,473 80.58 20,037 86.39 GO 12,161 56.08 14,066 60.65 KEGG ALL 20,547 94.76 22,750 98.09 KEGG KO 8,435 38.90 9,812 42.31 Swissprot 15,057 69.44 16,896 72.85 TrEMBL 20,587 94.94 22,790 98.26 TF 1,560 7.19 1,572 6.78 Pfam 17,064 78.69 19,709 84.98 NR 20,607 95.03 22,794 98.28 KOG 17,990 82.96 20,208 87.13 Unannotated − 995 4.59 349 1.50 -
[1] Christophel D. 1982. Earliest floral evidence for the Ebenaceae in Australia. Nature 296:439−41 doi: 10.1038/296439a0 [2] Duangjai S, Wallnöfer B, Samuel R, Munzinger J, Chase MW. 2006. Generic delimitation and relationships in Ebenaceae sensu lato: evidence from six plastid DNA regions. American Journal of Botany 93:1808−27 doi: 10.3732/ajb.93.12.1808 [3] Turner B, Munzinger J, Duangjai S, Temsch EM, Stockenhuber R, et al. 2013. Molecular phylogenetics of New Caledonian Diospyros (Ebenaceae) using plastid and nuclear markers. Molecular Phylogenetics and Evolution 69:740−63 doi: 10.1016/j.ympev.2013.07.002 [4] Loizzo MR, Said A, Tundis R, Hawas UW, Rashed K, et al. 2009. Antioxidant and Antiproliferative Activity of Diospyros lotus L. Extract and Isolated Compounds. Plant Foods Hum. Nutr. 64:264 doi: 10.1007/s11130-009-0133-0 [5] Rauf A, Uddin G, Siddiqui BS, Muhammad N, Khan H. 2014. Antipyretic and antinociceptive activity of Diospyros lotus L. in animals. Asian Pac. J. Trop. Biomed. 4:S382−S386 doi: 10.12980/APJTB.4.2014C1020 [6] Yang Y, Yang T, Jing Z. 2015. Genetic diversity and taxonomic studies of date plum (Diospyros lotus L.) using morphological traits and SCoT markers. Biochem. Syst. Ecol. 61:253−59 doi: 10.1016/j.bse.2015.06.008 [7] Cho BO, Yin HH, Park SH, Byun EB, Ha HY, et al. 2016. Anti-inflammatory activity of myricetin from Diospyros lotus through suppression of NF-κB and STAT1 activation and Nrf2-mediated HO-1 induction in lipopolysaccharide-stimulated RAW264.7 macrophages. Biosci. Biotechnol. Biochem. 80:1520−30 doi: 10.1080/09168451.2016.1171697 [8] Zhou R, Zhang X, Hu H, Li G, Song R. 2016. Plant regeneration from leaves of seedless date plum (Diospyros lotus L.). Northern Horticulture 40(22):104−6 doi: 10.11937/bfyy.201622026 [9] Ali S, Khan AS, Raza SA, Naveed R, Rehman R. 2013. Innovative breeding methods to develop seedless citrus cultivars. International Journal of Biosciences 3:191−201 doi: 10.12692/ijb/3.8.191-201 [10] Mesejo C, Martínez-Fuentes A, Reig C, Rivas F, Agustí M. 2006. The inhibitory effect of CuSO4 on Citrus pollen germination and pollen tube growth and its application for the production of seedless fruit. Plant Science 170:37−43 doi: 10.1016/j.plantsci.2005.07.023 [11] Sugiyama K, Morishita M. 2000. Production of seedless watermelon using soft-X-irradiated pollen. Scientia Horticulturae 84:255−64 doi: 10.1016/S0304-4238(99)00104-1 [12] Mesejo C, Reig C, Martínez-Fuentes A, Agustí M. 2010. Parthenocarpic fruit production in loquat (Eriobotrya japonica Lindl.) by using gibberellic acid. Scientia Horticulturae 126:37−41 doi: 10.1016/j.scienta.2010.06.009 [13] Doyle JJ, Doyle JL. 1986. A rapid DNA isolation procedure for small quantities of fresh leaf tissues. Phytochemical Bulletin 19:11−15 [14] Koren S, Walenz PB, Berlin K, Miller JR, Bergman NH, et al. 2017. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research 27:722−36 doi: 10.1101/gr.215087.116 [15] Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, et al. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods 10:563−69 doi: 10.1038/nmeth.2474 [16] Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, et al. 2014. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS One 9:e112963 doi: 10.1371/journal.pone.0112963 [17] Roach MJ, Schmidt SA, Borneman AR. 2018. Purge Haplotigs: Allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19:460 doi: 10.1186/s12859-018-2485-7 [18] Marçais G, Kingsford C. 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764−70 doi: 10.1093/bioinformatics/btr011 [19] Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nature Methods 9:357−59 doi: 10.1038/nmeth.1923 [20] Wingett S, Ewels P, Furlan-Magaril M, Nagano T, Schoenfelder S, et al. 2015. HiCUP: Pipeline for mapping and processing Hi-C data. F1000Research 4:1310 doi: 10.12688/f1000research.7334.1 [21] Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, et al. 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356:92−95 doi: 10.1126/science.aal3327 [22] Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754−60 doi: 10.1093/bioinformatics/btp324 [23] Chaisson MJ, Tesler G. 2012. Mapping single molecule sequencing reads using Basic Local Alignment with Successive Refinement (BLASR): Theory and Application. BMC Bioinformatics 13:238 doi: 10.1186/1471-2105-13-238 [24] Simão F, Waterhouse R, Ioannidis P, Kriventseva EV, Zdobnov EM. 2015. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31:3210−12 doi: 10.1093/bioinformatics/btv351 [25] Ou S, Jiang N. 2017. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiology 176:1410−22 doi: 10.1104/pp.17.01310 [26] Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, et al. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25:2078−79 doi: 10.1093/bioinformatics/btp352 [27] Lam HYK, Clark MJ, Chen R, Chen R, Natsoulis G, et al. 2012. Performance comparison of whole-genome sequencing platforms. Nature Biotechnology 30:78−82 doi: 10.1038/nbt.2065 [28] McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, et al. 2010. The genome analysis toolkit: A mapReduce framework for analyzing next-generation DNA sequencing data. Genome Research 20:1297−303 doi: 10.1101/gr.107524.110 [29] Tarailo-Graovac M, Chen N. 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics 25:4.10.1−4.10.14 doi: 10.1002/0471250953.bi0410s25 [30] Benson G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27:573−80 doi: 10.1093/nar/27.2.573 [31] Stanke M, Keller O, Gunduz I, Hayes A, Waack S, et al. 2006. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Research 34:W435−W439 doi: 10.1093/nar/gkl200 [32] Gertz EM, Yu YK, Agarwala R, Schäffer AA, Altschul SF. 2006. Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST. BMC Biology 4:41 doi: 10.1186/1741-7007-4-41 [33] Trapnell C, Pachter L, Salzberg SL. 2009. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25:1105−11 doi: 10.1093/bioinformatics/btp120 [34] Trapnell C, Roberts A, Goff L, Pertea G, Kim D, et al. 2012. Differential gene and transcript expression analysis of RNA-Seq experiments with TopHat and Cufflinks. Nature Protocols 7:562−78 doi: 10.1038/nprot.2012.016 [35] Campbell MS, Holt C, Moore B, Yandell M. 2014. Genome annotation and curation using MAKER and MAKER-P. Current Protocols in Bioinformatics 48:4.11.1−4.11.39 doi: 10.1002/0471250953.bi0411s48 [36] Lowe TM, Eddy SR. 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research 25:955−64 doi: 10.1093/nar/25.5.955 [37] Kalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD, et al. 2018. Non-coding RNA analysis using the Rfam database. Current Protocols in Bioinformatics 62:e51 doi: 10.1002/cpbi.51 [38] Nawrocki EP, Kolbe DL, Eddy SR. 2009. Infernal 1.0: inference of RNA alignments. Bioinformatics 25:1335−37 doi: 10.1093/bioinformatics/btp157 [39] Wang Y, Tang H, DeBarry J, Tan X, Li J, et al. 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Research 40:e49 doi: 10.1093/nar/gkr1293 [40] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990. Basic local aligment search tool. J. Mol. Biol. 215:403−10 doi: 10.1016/S0022-2836(05)80360-2 [41] Emms DM, Kelly S. 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biology 16:157 doi: 10.1186/s13059-015-0721-2 [42] Edgar RC. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32:1792−97 doi: 10.1093/nar/gkh340 [43] Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic Biology 59:307−21 doi: 10.1093/sysbio/syq010 [44] Yang Z. 2007. PAML 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24:1586−91 doi: 10.1093/molbev/msm088 [45] De Bie T, Cristianini N, Demuth JP, Hahn MW. 2006. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22:1269−71 doi: 10.1093/bioinformatics/btl097 [46] Martin G, Carreel F, Coriton O, Hervouet C, Cardi C, et al. 2017. Evolution of the banana genome (Musa acuminata) is impacted by large chromosomal translocations. Molecular Biology and Evolution 34:2140−52 doi: 10.1093/molbev/msx164 [47] Copley RR, Letunic I, Bork P. 2002. Genome and protein evolution in eukaryotes. Curr. Opin. Chem. Biol. 6:39−45 doi: 10.1016/S1367-5931(01)00278-2 [48] Danquah A, de Zelicourt A, Colcombet J, Hirt H. 2014. The role of ABA and MAPK signaling pathways in plant abiotic stress responses. Biotechnology Advances 32:40−52 doi: 10.1016/j.biotechadv.2013.09.006 [49] Roudier F, Gissot L, Beaudoin F, Haslam R, Michaelson L, et al. 2010. Very-long-chain fatty acids are involved in polar auxin transport and developmental patterning in Arabidopsis. The Plant Cell 22:364−75 doi: 10.1105/tpc.109.071209 [50] Duangjai S, Samuel R, Munzinger J, Forest F, Wallnöfer B, et al. 2009. A multi-locus plastid phylogenetic analysis of the pantropical genus Diospyros (Ebenaceae), with an emphasis on the radiation and biogeographic origins of the New Caledonian endemic species. Mol. Phylogenet. Evol. 52:602−20 doi: 10.1016/j.ympev.2009.04.021 [51] Rauf A, Uddin G, Patel S, Khan A, Halim SA, et al. 2017. Diospyros, an under-utilized, multi-purpose plant genus: A review. Biomedicine Pharmacotherapy 91:714−30 doi: 10.1016/j.biopha.2017.05.012 -