Transcriptome analysis of a taxol-producing endophytic fungus Cladosporium cladosporioides MD2

The shortage of molecular information for taxol-producing fungi has greatly impeded the understanding of fungal taxol biosynthesis mechanism. In this study, the transcriptome of one taxol-producing endophytic fungus Cladosporium cladosporioides MD2 was sequenced for the first time. About 1.77 Gbp clean reads were generated and further assembled into 16,603 unigenes with an average length of 1110 bp. All of the unigenes were annotated against seven public databases to present the transcriptome characteristics of C. cladosporioides MD2. A total of 12,479 unigenes could be annotated with at least one database, and 1593 unigenes could be annotated in all queried databases. In total, 8425 and 3350 unigenes were categorized into 57 GO functional groups and 262 KEGG pathways, respectively, exhibiting the dominant GO terms and metabolic pathways in the C. cladosporioides MD2 transcriptome. One potential and partial taxol biosynthetic pathway was speculated including 9 unigenes related to terpenoid backbone biosynthesis and 40 unigenes involved in the biosynthetic steps from geranylgeranyl diphosphate to 10-deacetylbaccatin III. These results provided valuable information for the molecular mechanism research of taxol biosynthesis in C. cladosporioides MD2. Electronic supplementary material The online version of this article (10.1186/s13568-018-0567-6) contains supplementary material, which is available to authorized users.


Introduction
Taxol (also named as Paclitaxel) originally isolated from Taxus brevifolia (Wani et al. 1971) was well-known as an anti-cancer drug used in clinic. Fermentation processes using taxol-producing microorganisms were considered as an efficient and sustainable way to produce taxol (Gond et al. 2014;Choi et al. 2016). Since one taxolproducing endophytic fungus Taxomyce andreanae isolated from the bark of T. brevifolia was firstly reported (Stierle et al. 1993), more than 40 fungal genera about 200 endophytic fungi had been reported to produce taxol (Flores-Bustamante et al. 2010;Kusari et al. 2014). However, low and un-stable taxol productivity in fungi restricted their industrial application (Gond et al. 2014;Kusari et al. 2014). Targeted genetic engineering of taxolproducing fungi was an effective way to exploit their taxol productivity, however there were difficult challenges due to the almost complete lack of molecular information on the exact pathway, key rate-limiting steps, and availability of a complete set of genes involved in the taxol biosynthesis of fungi (Flores-Bustamante et al. 2010;Gond et al. 2014). Thus, a prerequisite was to discover taxol-related genes and uncover the molecular mechanism of taxol biosynthesis in fungi (Flores-Bustamante et al. 2010;Kusari et al. 2014).
High-throughput RNA-seq technology has been extensively used for transcriptome analysis of many organisms without reference genomes to discover novel and interesting genes (Grabherr et al. 2011;Zhang et al. 2013 Hao et al. (2011) used RNA-seq technology to sequence the leaf transcriptome of Taxus mairei, and obtained 36,493 unigenes, including hundreds of taxane biosynthetic genes and some genes involved in tissue specific functions. Li et al. (2012) used RNA-seq technology to investigate the transcriptional profile of Taxus chinensis cells in response to methyl jasmonate (MeJA) elicitation, obtained 46,581 unigenes, and detected 13,469 differentially expressed genes, including all of the known jasmonate (JA) biosynthesis/ JA signaling pathway genes and taxol-related genes. Sun et al. (2013) used RNA-seq technology to investigate the transcriptomes of MeJA-treated and un-treated Taxus x media cells, obtained 40,348 unigenes, detected all the 29 known genes involved in the biosynthesis of terpenoid backbone and taxol, and found multiple candidates for the unknown steps in taxol biosynthesis. Additionally, Qiao et al. (2014) used RNA-seq technology to sequence the leaf transcriptome of Cephalotaxus hainanensis, obtained 39,416 unigenes, and detected a number of putative candidate genes involved in taxol biosynthesis. More importantly, most of taxol-related genes in Taxus have been cloned and characterized in vitro by functional expression in Escherichia coli or yeast using 'surrogate' substrates, and one taxol biosynthesis pathway in Taxus consisting of about 20 biosynthetic enzymes was depicted Walker and Croteau 2001;Croteau et al. 2006;Guo et al. 2006;Liu et al. 2016). These fruitful results gave us a relatively comprehensive knowledge of the taxol biosynthesis in Taxus. However, little was known about the fungal taxol biosynthesis pathway and taxol-related genes in fungi.
Recently, Yang et al. (2014) sequenced the genome of one taxol-producing fungus Penicillium aurantiogriseum NRRL 62431 isolated from Corylus avellana L., identified candidate genes involved in taxol biosynthesis, and found low sequence identities between the taxol biosynthetic candidate genes of P. aurantiogriseum NRRL 62431 and that of its host hazel as well as that of Taxus baccata. However, the molecular information on the taxol biosynthesis of Cladosporium cladosporioides MD2, one taxol-producing fungus isolated from T. x media (Zhang et al. 2009), was almost completely lacking. In this study, the transcriptome of C. cladosporioides MD2 was firstly sequenced by Illumina RNA-seq technology to generate 19,714,114 clean reads, that were de novo assembled with Trinity program (Grabherr et al. 2011) into 16,603 unigenes. All of the unigenes were annotated and analyzed by using BLAST against various protein and nucleotide databases, and the biological functions and metabolic pathways of these unigenes were further characterized through Gene Ontology (GO) annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation. One potential and partial taxol biosynthesis pathway in C. cladosporioides MD2 was speculated, and dozens of unigenes were annotated to be involved in taxol biosynthesis. These results provided a valuable source for further gene discovery, functional genomic research and comparative transcriptome analysis of C. cladosporioides MD2.

Fungal material and RNA extraction
Cladosporium cladosporioides MD2 was identified by China Center for Type Culture Collection (CCTCC) and deposited under CCTCC no. 207063 (Zhang et al. 2009). One milliliter of the spore suspension (about 10 8 spores) of C. cladosporioides MD2 was cultivated in 100 mL of potato dextrose liquid medium at 28 °C and 150 rpm for 132 h (equal to five and a half days), and the mycelium pellets were collected by centrifugation for RNA extraction. According to the manufacturer's instruction, total RNA was extracted by using TRIzol ® Reagent (Invitrogen, California, USA), treated with DNase I (Invitrogen, California, USA), dissolved in RNase-free water, and stored in a − 80 °C freezer. The quality and integrity of total RNA were detected by using Nanodrop2000 spectrophotometer (Thermo Fisher Scientific, California, USA) and Bioanalyzer 2100 (Agilent, California, USA), respectively. The purity of total RNA was assessed by both A260/280 and A260/230 ratios.

cDNA synthesis and library sequencing
About 60 μg of total RNA at a concentration ≥ 600 ng/ μL, OD260/280 = 1.8 ~ 2.2 and RNA integrity number (RIN) ≥ 7.0 were used for construction of a cDNA sequencing library using a TruSeqTM RNA Sample Preparation Kit (Illumina, San Diego, California, USA). Briefly, poly-A mRNA samples were isolated from the total RNA using oligo (dT) attached magnetic beads and then interrupted into 200-500 bp fragments. Subsequently, the fragmented mRNAs as well as random hexamer primers and reverse transcriptase were used to synthesize first-strand cDNAs, that were used to synthesize secondstrand cDNAs with DNA polymerase I and RNase H. The cDNA products were purified with QIAquick PCR Purification Kit (Qiagen, California, USA), end-repaired and A-tailed using End Repair Mix (Qiagen, California, USA), and then ligated with sequencing index adapters (Illumina, San Diego, California, USA). The ligated products were isolated by gel electrophoresis to select suitable fragments and were subsequently amplified to generate the final cDNA libraries, that were sequenced using Illumina HiSeq ™ 2500 sequencing platform at Wuhan Bio-Broad Biotechnology Co., Ltd. (Wuhan, China).

De novo assembly by trinity
The raw reads were filtered to generate clean reads by removal of adaptors, ambiguous reads with unknown nucleotides larger than 10%, and low-quality reads where more than half of the bases' qualities were less than five (Quality score < 5). The clean reads were deposited in NCBI Sequence Read Archive (SRA) Sequence Database with accession number SRR6147052. Subsequently, the clean reads were de novo assembled by Trinity program (Grabherr et al. 2011) to generate non-redundant unigenes. In brief, all the clean reads were assembled into contigs with different k-mer values to assess optimal assembly range. The assembled transcripts were as reference sequences for transcript annotation and re-validated by aligning clean reads back to transcripts (Miller et al. 2012). Each of clean reads was re-mapped to reference sequences using Bowtie (Langmead 2010) with default parameters. The longest transcripts within each operationally assembled contig were defined as unigenes.

Transcriptome annotation
The annotation of unigenes were performed by using BLAST algorithm against NCBI non-redundant protein sequence (Nr) database, NCBI non-redundant nucleotide sequence (Nt) database, Swiss-Prot database, Protein family (Pfam) database with an E-value threshold of 1.0E−5, and the best aligning results were used to annotate unigenes and decide the sequence direction of each unigene. Additionally, some unigenes that had no alignment hit in the above databases were further estimated by using ESTScan (3.0.3) software (Iseli et al. 1999) to predict their protein coding sequences (CDS). With Nr and Swiss-Prot annotation, Blast2GO software (Conesa et al. 2005) was used to perform gene ontology (GO) annotation of unigenes to obtain cellular component, molecular function and biological process terms. After retrieving the associated GO terms, the possible functions of unigenes were predicted against eukaryotic Orthologous Group (KOG) database. Additionally, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways annotation of unigenes was performed by mapping the sequences obtained from Blast2GO to the contents of KEGG database (Kanehisa and Goto 2000).

Analysis of potential fungal taxol biosynthesis pathway
The peptide sequences of unigenes were searched on Carbohydrate-Active Enzymes (CAZy) database (Park et al. 2010) to find the enzymes involved in taxol biosynthesis pathway. In parallel, the reported gene sequences involved in the taxol biosynthesis from Taxus or other organisms were retrieved from GenBank. All the sequences were then manually aligned by BLAST search against the assembled contigs to predict potential genes involved in the taxol biosynthesis. All the hits with an E-value threshold of 1.0E−10 were used as queries to search KEGG database again for further identification, and the genes were kept if their subjects were annotated as the enzymes involved in the taxol biosynthesis.

Reverse-transcribed PCR validation of unigenes
Total RNA from C. cladosporioides MD2 was reversetranscribed by using SuperScript III Reverse Transcriptase (Invitrogen, California, USA) and oligo(dT)18. Ten of assembled unigenes were randomly selected for reverse-transcribed PCR (RT-PCR) validation, and the primer sequences of these assembled unigenes were listed in Additional file 1: Table S1.

Transcriptome sequencing and assembly
The transcriptome of C. cladosporioides MD2 was sequenced with Illumina sequencing technology to generate 19,776,638 raw reads consisting of 1,779,897,420 nucleotides (nt). After removing the adaptor sequences, ambiguous reads and low-quality reads, a total of 19,714,114 clean reads (approximate 1.77 Gbp) were obtained (GenBank Accession Number: SRR6147052), and were further de novo assembled into 16,603 unigenes with an average length of 1110 bp and an N 50 of 1894 bp (Table 1). Among them, 6976 unigenes, 2942 unigenes and 3819 unigenes were within the ranges of 200-500, 500-1000 and 1000-2000 bp, respectively, and 2866 unigenes were longer than or equal to 2000 bp (Fig. 1a). Additionally, ten of the assembled unigenes of C. cladosporioides MD2 were randomly selected for RT-PCR validation, and all of the 10 assembled unigenes got right amplifications (Additional file 1: Figure S1). These  (Fig. 1b). Similarity distribution analysis showed that 36.80% of the mapped sequences had a similarity higher than 80%, whereas 63.2% of the mapped sequences had a similarity ranging from 18 to 80% (Fig. 1c). Additionally, species distribution analysis revealed that C. cladosporioides MD2 had a number of homologous sequences with several species, and the genes from Baudoinia compniacensis had the highest similarity (19.90%) with that from C. cladosporioides MD2, followed by Dothistroma septosporum (14.50%), Pseudocercospora fijiensis (10.90%), Zymoseptoria brevis (8.40%), and Zymoseptoria tritici (7.10%). The low sequence similarity of the genes from C. cladosporioides MD2 to the genes from these above species suggested the impossibility of using the genomes of these relative species as references for the transcriptome analysis of C. cladosporioides MD2. Additionally, 3048 unigenes (18.35%), 8431 unigenes (50.78%), 7211 unigenes (43.43%), 8425 unigenes (50.74%), 4904 unigenes (29.53%) and 3350 unigenes (20.17%) were annotated in the databases of Nt, Pfam, Swiss-Prot, GO, KOG and KEGG, respectively, and the mapping rates were from 18 to 71% (Table 2). In total, Fig. 1 mRNA-seq and de novo assembly of C. cladosporioides MD2 transcriptome. a Distribution of unigene length. b E-value distribution of BLAST hits for each unique sequence against the Nr database. c Similarity distribution of the top BLAST hits for each sequence against the Nr database 12,479 unigenes (75.16%) could be annotated with at least one database, and 1593 unigenes (9.59%) could be annotated in all of seven queried databases, with relatively well defined functional annotations (Table 2). These annotations gave us a valuable resource for investigating specific pathways and function genes in C. cladosporioides MD2. Furthermore, all of the unigenes were aligned by BLASTX (E-value ≤ 1.0E−5) to all of seven queried databases to predict their coding sequences (CDS), and the correct reading frames of the nucleotide sequences of unigenes (5′-3′ direction) were defined by the highest rank in the BLAST results. In addition, the CDS of some unigenes, that could not be aligned to the above databases, were predicted by ESTScan software. In total, 16,578 unigenes were identified to harbor their CDS, where the CDS of 11,975 unigenes were identified by BLAST alignment and that of 4603 were estimated by the ESTScan.
Finally, the 10-deacetyl-2-debenzoylbaccatin III was catalyzed by TBT into 10-DAB. These unigenes homologous to the genes for TS, T5αH, T13αH and TBT provided candidates for further verification of the fungal taxol biosynthesis pathway in C. cladosporioides MD2.

Discussion
A fact was that successive replating of most of taxolproducing fungi in semisynthetic medium resulted in a decrease in taxol production (Flores-Bustamante et al. 2010;Venugopalan and Srivastava 2015). The original taxol yield and 10-DAB yield of C. cladosporioides MD2 were about 800 and 120 µg/L in 2009 (Zhang et al. 2009), however they were dramatically decreased to about 5-7 and 50 µg/L in 2014, respectively. Afterwards, the taxol yield and 10-DAB yield of C. cladosporioides MD2 basically remained unchanged up to the final test in July, 2016. This biological phenomenon of loss in taxol productivity with subculture was certainly related to the taxol biosynthesis mechanism of C. cladosporioides MD2. Here, the transcriptome of the successively subcultured C. cladosporioides MD2 with taxol production drawdown was investigated for the first time with a RNA-seq technology to explore gene-expressed characteristics in this situation. About 1.77 Gbp clean reads were generated from the transcriptome of C. cladosporioides MD2 and further assembled into 16,603 unigenes, that were annotated with seven protein and nucleic databases. Based on the KEGG analysis, 40 unigenes in the C. cladosporioides MD2 transcriptome were annotated to be related to the fungal taxol biosynthesis, and were homologous to the genes for TS, T5αH, T13αH, and TBT, respectively.
Additionally, an interesting and important finding was that unigenes homologous to those genes for enzymes functioning in the biosynthetic steps from 10-DAB to taxol, such as 10-deacetylbaccatin III-10-O-acetyltransferase (DBAT), C-13 phenylpropanoid side chain-CoA acyltransferase (BAPT) and 3′-N-debenzoyltaxol N-benzoyltransferase (DBTNBT), were not detected in the C. cladosporioides MD2 transcriptome. DBAT catalyzed the 10-DAB into Baccatin III, one immediate diterpenoid precursor of taxol (Walker and Croteau 2000b), BAPT catalyzed the selective 13-O-acylation of Baccatin III with β-phenylalanoyl-CoA as one acyl donor to form N-debenzoyl-2′-deoxytaxol (i.e. catalyzed the attachment of biologically important taxol side chain precursor) (Walker et al. 2002a), and DBTNBT catalyzed the final acylation reaction of taxol biosynthesis (Walker et al. 2002b). None of candidate unigenes homologous to those genes for DBAT, BAPT and DBTNBT might be a reason for the loss of taxol productivity in the subcultured C. cladosporioides MD2. This result clued us for future research that, (1) whether some genes homologous to the genes for DBAT, BAPT and DBTNBT exited in the genome but did not transcribed in the transcriptome of the subcultured C. cladosporioides MD2; (2) Or, some genes, that encoded products with similar function as DBAT, BAPT, DBTNBT but had low sequence identities to their genes, were not annotated and detected by sequence alignment in the transcriptome of C. cladosporioides MD2.
Collectively, this study not only provided an available transcriptome resource for further functional genomic research and comparative transcriptome analysis of C. cladosporioides MD2, but also revealed one potential and novel branch for the 10-deacetyl-2-debenzoylbaccatin III biosynthesis and valuable candidate genes for future investigation of the taxol biosynthesis mechanism in C. cladosporioides MD2.