Materials and methods

Data source

Genome data

All the genome sequencing data of 8 cannabis varieties were collected from published literatures and open resources. Data were downloaded from the NCBI Assembly Database (https://www.ncbi.nlm.nih.gov/assembly)(Table1).

 

Variety Symbol Accession Sex Research group Genome size Study
Jamaican Lion DASH Cs_JLD GCA_003660325 Female Medical Genome 1.07Gb McKernan er al., 2018
Finola Cs_FN GCA_003417725 Male University of Toronto 1.01Gb Laverty et al., 2018
Purple Kush Cs_PK GCA_000230575 Female University of Toronto 0.89Gb Laverty et al., 2018
CBDRx-18 Cs_CBD GCA_900626175 Female Sunrise Genetics 0.88Gb Grassa et al., 2018
Pineapple Banana Bubba Kush Cs_PBK GCA_002090435 Male Steep Hill Genetics / CU Boulder, CGRI 512Mb  
LA Confidential Cs_LAC GCA_001510005 Female Courtage Life Sciences 595Mb  
Chemdog91 Cs_CD91 GCA_001509995 Female Courtage Life Sciences 286Mb  
Cannatonic Cs_CAN GCA_001865755 Female Phylos Bioscience 586Mb  

 Table 1. The Overview of cannabis genomes from 8 varieties.

 

References

McKernan, K., Helbert, Y., Kane, L.T., Ebling, H., Zhang, L., Liu, B., et al. (2018) Cryptocurrencies and Zero Mode Wave guides: An unclouded path to a more contiguous Cannabis sativa L. genome assembly

Laverty, K.U., Stout, J.M., Sullivan, M.J., Shah, H., Gill, N., Holbrook, L., et al. (2018) A physical and genetic map of Cannabis sativa identifies extensive rearrangement at the THC/CBD acid synthase locus. Genome Res., gr.242594.118.

Grassa, C.J., Wenger, J.P., Dabney, C., Poplawski, S.G., Motley, S.T., Michael, T.P., et al. (2018) A complete Cannabis chromosome assembly and adaptive admixture for elevated cannabidiol (CBD) content. bioRxiv, 458083.

 

Transcriptome data

The total 195.7Gb RNA-seq data of 16 cannabis varieties were obtained from published literatures, while the RNA-Seq from one variety (Therapy) was generated by this study. (details of transcriptomic data analyzing methods are provided in “Analysis pipeline” section). Published data were searched and downloaded from the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra). The details of collected transcriptome data are provided in Table 2.

 

Variety SRA Accession Project Accession Tissue Study
Finola SRP155904 PRJNA483805 Trichome (stalked、bulbous、sessile) Livingston et al., 2020
Purple Kush SRP008673 PRJNA73819 Pool (Roots, shoots, stems, pre-flowers, early-stage flowers and mid-stage flowers) Laverty et al., 2018
cv. Santhica 27 SRP133605 PRJNA435671 Bast fibres (bottom, middle, top) and hypocotyl (20, 15, 9, 6 days) Guerrieri et al., 2017 and Behr et al., 2016
Yunma 1 SRP041340 PRJNA245084 Pool (leaf, root, stem and shoot) Gao et al., 2018
var. A ERP023948 PRJEB21674 Stem peel, core, stem Leebens-Mack et al., 2019
var. C ERP023948 PRJEB21674 Xylem Leebens-Mack et al., 2019
Sour Diesel SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
Canna Tsu SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
Black Lime SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
Valley Fire SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
Cherry Chem SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
Terple SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
Black Berry Kush SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
White Cookies SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
Mama Thai SRP168446 PRJNA498707 Glandular trichome Zager et al., 2019
Cannabio-2 SRP234963 PRJNA560453 Female flower (stage 1-4), Male flower, Trichome (stage 1-4) Braich et al., 2019
Therapy     Leaf, root and stem This study

Table 2. The Overview of transcriptome data collected by CannbisGDB.

 

References

Livingston, S.J., Quilichini, T.D., Booth, J.K., Wong, D.C.J., Rensing, K.H., Laflamme‐Yonkman, J., et al. (2020) Cannabis glandular trichomes alter morphology and metabolite content during flower maturation. The Plant Journal, 101, 37–56.

Laverty, K.U., Stout, J.M., Sullivan, M.J., Shah, H., Gill, N., Holbrook, L., et al. (2018) A physical and genetic map of Cannabis sativa identifies extensive rearrangement at the THC/CBD acid synthase locus. Genome Res., gr.242594.118.

Guerriero, G., Behr, M., Legay, S., Mangeot-Peter, L., Zorzan, S., Ghoniem, M., and Hausman, J.-F. (2017) Transcriptomic profiling of hemp bast fibres at different developmental stages. Scientific Reports, 7, 4961.

Behr, M., Legay, S., Žižková, E., Motyka, V., Dobrev, P.I., Hausman, J.-F., et al. (2016) Studying Secondary Growth and Bast Fiber Development: The Hemp Hypocotyl Peeks behind the Wall. Front. Plant Sci., 7.

Gao, C., Cheng, C., Zhao, L., Yu, Y., Tang, Q., Xin, P., et al. (2018) Genome-Wide Expression Profiles of Hemp (Cannabis sativa L.) in Response to Drought Stress. International Journal of Genomics, 2018, e3057272.

Leebens-Mack, J.H., Barker, M.S., Carpenter, E.J., Deyholos, M.K., Gitzendanner, M.A., Graham, S.W., et al. (2019) One thousand plant transcriptomes and the phylogenomics of green plants. Nature, 574, 679–685.

Zager, J.J., Lange, I., Srividya, N., Smith, A., and Lange, B.M. (2019) Gene Networks Underlying Cannabinoid and Terpenoid Accumulation in Cannabis. Plant Physiology, 180, 1877–1897.

Braich, S., Baillie, R.C., Jewell, L.S., Spangenberg, G.C., and Cogan, N.O.I. (2019) Generation of a Comprehensive Transcriptome Atlas and Transcriptome Dynamics in Medicinal Cannabis. Scientific Reports, 9, 16583.


Metabolite data

The metabolite data were searched and download from both published literatures and this study (details of the methods for chemical analysis are provided in “Analysis pipeline” section). The details of the varieties and metabolites determined in each study are listed in Table 3.

 

Tissue Variety Item Study
Cured flowers Lemon Sherbet, Orange Skunk, Star Killer, Gorilla Glue, Gorilla Glue #4, Maui Haze, Sunset Sherbet, Nightmare Cookie, The Sauce, Jabberwocky, La Choco, Lemon Balm, Jack Herer, Black Boss, 8-Ball Kush, Rollex OG, Adonis, Cookies and Cream, Moonshine Ghost Trane Haze, Rocket Fuel, Venom OG, Hash Plant, Lucky Charms, Chemdawg #4, Chemdawg, Platinum Delight, Lemon Skunk, Purple Eclipse, Rosetta Stone, Star Bud, Kush Puppy, Tangerine Dream, Cosmic Lotus, Satori, Bob Marley, Wonder Woman, Alien Sour Apple, Golden Cobra, Grape Stomper, Kandy Kush, Lemon Goji, Skywalker OG, FLO, Strawberry Fields, Spectrum, Goji OG, Moby Dick, Blue Dream, Pineapple Skunk, Skunk Haze, 9 Lb Hammer, Pipe Dream, 97 Sage, Lemon OG, Liberty Haze, Lohan, Golden Sage, Dairy Queen, Dark Shadow Haze, Hemlock, White Widow, Grizzly Kush (Total 62) THC, CBD, CBC, CBN, THCVA, CBG Richins et al., 2018
Flower buds Blackberry_Kush, Black_Lime, Canna_Tsu, Mama_Thai, Valley_Fire, Cherry_Chem, Terple, Sour_Diesel, White_Cookies (Total 9) THC, THCA, CBD, CBDA, CBG Zager et al., 2019
Leaf Ak47, Bama_Yao, Blueberry, CBD_therapy, Divine, Jack_flash, Lemon, Linzhi, Matanuska, QiB, San_Francisco, Therapy, Auto_CBG, CBD_Auto_Charlotte's_Angel, Pink_Kush_CBD_30:1_Auto, Sweet_CBD_Auto, Dinamed_CBD_Auto, Durban_Poison (Total 18) THCA, CBDA, CBCA This study
Leaf CAN36/97, Felina, Sudi, CAN22/88, VIR575, Novosadska, CAN26/93, CAN28/01, Ferimon, VIR577, CAN35/97, CAN20/02, CAN24/89, Petera, Carmagnola, CAN21/02, CAN19/87, Carmen, Fedrina, Beniko, CAN29/94, Jus8, CAN40/99, LKSD, K436, Kompolti, Bialobrzeskie, Tygra, CAN16/94, Delores, Zlotnoska, Purple Kush (Total 32) CBDV, CBN, CBD, THC, CBC, CBG, CBDA, THCA, CBCA, CBGA, CBDVA, THCVA, CBGVA This study
Leaf Futura_75, USO_11, Kompolti, Futura_77, Rastislavicka, Krasnodarskaya, Fedrina_74, MS_77, CHG_SSL_12, Thai_Skunk, Skunk_1, Super_Skunk, Pan_3 (Total 13) THC, CBD, CBG Welling et al., 2016
Leaf Crystal Cookies, Platinum Gorilla, Green Crack 2017, Twisted Velvet, Double Royal Kush, Blue Dream, Platinum Buffalo, Platinum Scout, Blue Cherry Pie, White Widow, Cold Creek Kush, Walker Kush, Lavendar, Skywalker, Grandaddy Purple, Sour Willie, Purple Fat Pie, Holy Power, Romulin, Oracle, Alien Blues, RX, FLO, Thunderstruck, Lavendar Jones 2015, Green Crack 2015, Bohdi Tree, Lavendar Jones 2017, Juanita, Love Lace (Total 30) THCV, THC, CBD, CBC, CBG Richins et al., 2018
Leaf Beniko, Bialobrzeskie, Dneprovskaya_Odnodomnaya_6, Eletta_Campana, Fedora_19, Fedrina_74, Felina_34, Ferimon_12, Fibramulta_151, Fibrimon_24, Fibrimon_56, Futura_77, Kompolti, Kompolti_Hybrid_TC, Kompolti_Sargaszaru, Lovrin_110, Rastslaviska, Secuieni_1, Superfibra, Uniko_B, USO_11, USO_13, YUSO_14, YUSO_16 (Total 24) CBD, THC Meijer., 1995
Inflorescence Holy Power, Green Crack, Platinum Scout, Green Crack 2015, Crystal Cookies, Oracle, Lavendar Jones 2015, Lavendar Jones, Walker Kush, Platinum Gorilla, White Widow, FLO, Grandaddy Purple, Platinum Buffalo, RX, Twisted Velvet, Double Royal Kush, Lavendar, Skywalker, Blue Dream, Purple Fat Pie, Romulin, Blue Cherry Pie, Cold Creek Kush, Alien Blues, Thunderstruck, Sour Willie, Love Lace, Bohdi Tree, Juanita (Total 30) THCV, THC, CBD, CBC, CBG Richins et al., 2018
Inflorescence Arbel, DQ, Paris, SCBD, Roma (Total 5) THC, CBD, CBG Namdar et al., 2019

 Table 3. The Overview of metabolite data collected by CannbisGDB

 

References

Richins, R.D., Rodriguez-Uribe, L., Lowe, K., Ferral, R., and O’Connell, M.A. (2018) Accumulation of bioactive metabolites in cultivated medical Cannabis. PLOS ONE, 13, e0201119.

Zager, J.J., Lange, I., Srividya, N., Smith, A., and Lange, B.M. (2019) Gene Networks Underlying Cannabinoid and Terpenoid Accumulation in Cannabis. Plant Physiology, 180, 1877–1897.

Welling, M.T., Liu, L., Shapter, T., Raymond, C.A., and King, G.J. (2016) Characterisation of cannabinoid composition in a diverse Cannabis sativa L. germplasm collection. Euphytica, 208, 463–475.

Meijer, E. (1995) Fibre hemp cultivars: A survey of origin, ancestry, availability and brief agronomic characteristics. Journal of the International Hemp Association, 2(2): 66-73.

Namdar, D., Voet, H., Ajjampura, V., Nadarajan, S., Mayzlish-Gati, E., Mazuz, M., et al. (2019) Terpenoids and Phytocannabinoids Co-Produced in Cannabis Sativa Strains Show Specific Interaction for Cell Cytotoxic Activity. Molecules, 24, 3031.

 

Protein data

All the protein data were obtained from published literatures. The details of the collected protein data are provided in Table 4.

 

Tissue Study Year
Seed Proteomic characterization of hempseed (Cannabis sativa L.) 2016
Seed Production, digestibility and allergenicity of hemp (Cannabis sativa L.) protein isolates 2019
Trichome Metabolomics, proteomics, and transcriptomics of Cannabis sativa L. trichomes 2014
Trichome Comparative Proteomics of Cannabis sativa Plant Tissues 2004
Flower  Comparative Proteomics of Cannabis sativa Plant Tissues 2004
Flower buds Optimisation of Protein Extraction from Medicinal Cannabis Mature Buds for Bottom-Up Proteomics 2019

Table 4. The Overview of protein data collected by CannbisGDB

 

References

Aiello, G., Fasoli, E., Boschin, G., Lammi, C., Zanoni, C., Citterio, A., and Arnoldi, A. (2016) Proteomic characterization of hempseed (Cannabis sativa L.). Journal of Proteomics, 147, 187–196.

Mamone, G., Picariello, G., Ramondo, A., Nicolai, M.A., and Ferranti, P. (2019) Production, digestibility and allergenicity of hemp (Cannabis sativa L.) protein isolates. Food Research International, 115, 562–571.

Happyana, N. (2014) Metabolomics, proteomics, and transcriptomics of Cannabis sativa L. trichomes. Doctor of Philosophy - PhD Thesis

Raharjo, T.J., Widjaja, I., Roytrakul, S., and Verpoorte, R. (2004) Comparative Proteomics of Cannabis sativa Plant Tissues. J Biomol Tech, 15: 97–106.

Vincent, D., Rochfort, S., and Spangenberg, G. (2019) Optimisation of Protein Extraction from Medicinal Cannabis Mature Buds for Bottom-Up Proteomics. Molecules, 24: 659.

 

Analysis pipeline

Assembly analysis

RepeatMasker (Smit et al., 2013) was used to perform repeat sequence analysis on the genomic sequences of each variety. The genomic sequences after masking the repeat sequence were used to predict genes. The predicted gene set was subjected to BUSCO (Seppey et al., 2019) for assessing the genome completeness.

Gene prediction and annotation

Three methods are used to predict protein coding genes, through the homology prediction with the related species (Morus alba, Humulus lupulus, Solanum lycopersicum, Arabidopsis thaliana, and Cannabis sativa L.), the de novo prediction of the gene model, and the prediction supported by the RNA-seq. Genewise (Birney et al., 2004) was used for related species homology prediction. The de novo prediction software AUGUSTUS (Stanke and Morgenstern, 2005), GlimmerHMM (Majoros et al., 2004), and SNAP (Leskovec and Sosič, 2016) were used to screen and train the optimal model. The sets of RNA-seq data were aligned to the reference genome using HISAT2 (Kim et al., 2015). The optimal transcripts were used to predict ORF (Open Reading Frame) using TransDecoder (https://github.com/TransDecoder/TransDecoder). EVidenceModeler (EVM) (Haas et al., 2008) was used to integrate the results predicted by the above methods into a non-redundant and complete gene set. PASA (https://github.com/PASApipeline/PASApipeline) was used to correct the gene set, including the transcript splicing information, untranslated region (UTR), and transcripts of alternative splicing. To perform gene function annotation, the predicted gene set was aligned to NR and UniProt database by BLASTP. GO annotation and KEGG annotation was also performed. InterProScan (Quevillon et al., 2005) was used to identify domains.

Transcriptomic data analysis

FASTQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and BBTools (https://jgi.doe.gov/data-and-tools/bbtools) were used to control sequencing quality. HISAT2 (Kim et al., 2015) was used to perform splice-aware reads alignment. Transcripts per million (TPM) value was used for measuring gene expression level by Stringtie (Pertea et al., 2015) and FeatureCount (Liao et al., 2014).

Metabolite data mining and chemical analysis

Published literatures regarding the cannabinoid contents in 4 different tissues of 210 cannabis varieties were searched and summarized. The EChart (Li et al., 2018) was to show the percentage of different cannabinoids in the total cannabinoid content.

Sample collection and preparation

Fresh leaves from cannabis were collected and stored at -80℃ until sample preparation and analysis. To prepare the sample, 100±5mg cannabis leaves were weighted for each sample in triplicate. The weighted materials were crushed with 1000±50µL acetonitrile (with 100µg/mL warfarin as IS) by the cell crushing apparatus, and the steel balls were used to increase the crushing efficiency, followed by the sonication for 20 minutes at 20 degree. The extract was then centrifugated for 10 minutes (12,000 rpm) and the supernatant was moved into new 2mL centrifugation tubes. 100µL of the extract solvent was diluted with 900µL acetonitrile and filtered through a 0.22µm syringe tip filter. The prepared solutions were spiked with ∆9-THC-d3, CBD-d3, and THC-d3 (0.5 μg/mL) as internal standards (IS) prior to UPLC-MS analysis. Dilutions were applied as necessary.

UPLC-MS setup for cannabinoids assay

The ultra-performance liquid chromatography–mass spectrometry (UPLC-MS) system used in this analysis was a modular Waters ACQUITY™ H class UPLC-MS system with electrospray ionization (ESI) coupling with a Vion ion mobility spectrometry (IMS) quadropole time-of-flight (Q-TOF) (Vion IMS Q-ToF) mass spectrometer (MS). The chromatographic separation of cannabinoids was performed on an Acquity UPLCTM BEH C18 column (1.7µm, 2.1mm x 100mm). The mobile phase was composed of acetonitrile with 0.1% formic acid (B) and aqueous 0.1% formic acid (C). Gradient elution was as follows: 50%-90% B in 0-7minutes, 90% B in 7-10.5 minutes and 50% B in 10.5-12 minutes. The flow rate was 0.35mL/min before 10.5minutes and then raised to 0.4 mL/min and the column temperature was set at 40℃. The electrospray ionization mass spectrometry (ESI-MS) system was operated in negative ionization mode. The other experimental parameters for full-scan mode were set as follows: m/z range, 50-2000; capillary voltage, 2.8 KV; cone gas flow, 50L/h; desolvation gas flow, 600L/h; source temperature, 120℃; desolvation temperature, 380℃; low collision energy, 6eV; high collision energy ramp, 20-30eV.All data were acquired by UNIFI software.

Cannabinoids Quantification

For each analysis, the concentrations of cannabinoids were calculated as follows:

 

While  represented the concentration of targeted cannabinoids (mg/g sample) and  represented the peek area of targeted cannabinoids/internal standard, integrated by the UNIFI software. The mean value and the standard deviation (SD) were calculated with the following equation: 

While  represented the repeated analysis of the targeted cannabinoids.

Protein data mining

Published literatures related to identification of cannabis protein profiles in 4 different tissues were searched and summarized. All the proteins with an Uniprot accession listed in the published literatures were BLASTP searched for the best match protein in CannabisGDB.

Database implementation

CannabisGDB was established in the Linux (Ubuntu 18.4.1) operating system with a Nginx HTTP server. PHP and JavaScript were used to build the user-friendly interface and design web pages. JBrowse version 1.16.6 was used for the visualization, interpretation and navigation of a genome in a coherent visual framework (Buels et al., 2016). SequenceServer version 2.0.0 beta3 was used to perform homology searches between different data sets of Cannabis (Priyam et al., 2019). Primer3 was integrated to design target-specific primers. SynVisio was implemented to show the collinearity between the four genomes (CsPK, CsFN, CsJLD, CsCBD)for further comparative genomics analysis. Using R package shiny (https://cran.r-project.org/package=shiny), morpheus (https://cran.r-project.org/package=morpheus) and clusterProfiler (Yu et al., 2012) to build heatmap and enrichment analysis tool.

 

References

Bandi, V. and Gutwin, C. (2020) Interactive Exploration of Genomic Conservation. In: Proceedings of Graphics Interface 2020 GI 2020 , pp. 74 – 83. Canadian Human-Computer Communications Society / Société canadienne du dialogue humain-machine.

Birney, E., Clamp, M., and Durbin, R. (2004) GeneWise and Genomewise. Genome Res, 14, 988–995.

Buels, R., Yao, E., Diesh, C.M., Hayes, R.D., Munoz-Torres, M., Helt, G., et al. (2016) JBrowse: a dynamic web platform for genome visualization and analysis. Genome Biology, 17, 66.

Haas, B., Salzberg, S., Zhu, W., Pertea, M., Allen, J., Orvis, J., et al. (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome biology, 9, R7.

Kim, D., Langmead, B., and Salzberg, S.L. (2015) HISAT: a fast spliced aligner with low memory requirements. Nature Methods, 12, 357–360.

Leskovec, J. and Sosič, R. (2016) SNAP: A General-Purpose Network Analysis and Graph-Mining Library.

Li, D., Mei, H., Shen, Y., Su, S., Zhang, W., Wang, J., et al. (2018) ECharts: A declarative framework for rapid construction of web-based visualization. Visual Informatics, 2, 136–146.

Liao, Y., Smyth, G.K., and Shi, W. (2014) featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics, 30, 923–930.

Majoros, W.H., Pertea, M., and Salzberg, S.L. (2004) TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics, 20, 2878–2879.

Pertea, M., Pertea, G.M., Antonescu, C.M., Chang, T.-C., Mendell, J.T., and Salzberg, S.L. (2015) StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology, 33, 290–295.

Priyam, A., Woodcroft, B.J., Rai, V., Moghul, I., Munagala, A., Ter, F., et al. (2019) Sequenceserver: A Modern Graphical User Interface for Custom BLAST Databases. Mol Biol Evol, 36, 2922–2924.

Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., and Lopez, R. (2005) InterProScan: protein domains identifier. Nucleic Acids Res., 33, W116-120.

Seppey, M., Manni, M., and Zdobnov, E.M. (2019) BUSCO: Assessing Genome Assembly and Annotation Completeness. In: Gene Prediction: Methods and Protocols Methods in Molecular Biology (Kollmar,M., ed) , pp. 227–245. New York, NY: Springer.

Smit, A., Hubley, R., and Green, P. (2013) RepeatMasker Open-4.0. P. RepeatMasker Open-4.0.

Stanke, M. and Morgenstern, B. (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res., 33, W465-467.

Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B.C., Remm, M., and Rozen, S.G. (2012) Primer3—new capabilities and interfaces. Nucleic Acids Res, 40, e115–e115.

Yu, G., Wang, L.-G., Han, Y., and He, Q.-Y. (2012) clusterProfiler: an R Package for Comparing Biological Themes Among Gene Clusters. OMICS: A Journal of Integrative Biology, 16, 284–287.

 

 

Get in Touch