Chitin degradation potential and whole-genome sequence of Streptomyces diastaticus strain CS1801

The aim of this study was to evaluate the chitin degradation potential and whole-genome sequence of Streptomyces diastaticus strain CS1801, which had been screened out in our previous work. The results of fermentation revealed that CS1801 can convert the chitin derived from crab shells, colloidal chitin and N-acetylglucosamine to chitooligosaccharide. Additional genome-wide analysis of CS1801 was also performed to explore the genomic basis for chitin degradation. The results showed that CS1801 possesses a chromosome with 5,611,479 bp (73% GC) and a plasmid with 1,388,284 bp (73% GC). The CS1801 genome consists of 7584 protein-coding genes, 90 tRNA and 21 rRNA operons. In addition, the results of genomic CAZyme analysis indicated that CS1801 comprises 103 glycoside hydrolase family genes, which could regulate the glycoside hydrolases that contribute to chitin degradation. The whole-genome information of CS1801 could highlight the mechanism underlying the chitin degradation activity of CS1801, strongly indicating that CS1801 is characterized by a substantial number of genes encoding chitinases and the complete metabolic pathway of chitin, conferring CS1801 with promising potential applicability in chitooligosaccharide production.


Introduction
Chitooligosaccharide (COS) is a water-soluble polysaccharide obtained by treatment of chitin or chitosan with acid hydrolysis, enzymatic degradation or both (Einbu et al. 2007;Liu et al. 2009). Generally, COS is characterized by a degree of polymerization (DP) of less than 20 (Lee et al. 2002). COS performs many functions, such as antibacterial effects , antioxidant activity (Fang et al. 2015), and animal and plant growth promotion (Nandhini et al. 2017;Shenghe et al. 2017). COS has become a research topic of interest and has been widely used in medicine (Zhao et al. 2017;Zhou et al. 2018), agriculture (Swiatkiewicz et al. 2015;Lan et al. 2016) and food (Cao et al. 2018;Jiang et al. 2018).
Chitin is an important precursor for the production of COS and a polymer of N-acetylglucosamine (GlcNAc) linked by β-1,4-glycosidic bonds (Nguyen-Thi and Doucet 2016). Chitin is ubiquitous; its reserves are second only to those of cellulose, which is the second largest renewable resource on Earth, and approximately 10 billion tons of chitin is biosynthesized (Rinaudo 2006). The main raw material used in the industrial production of chitin is discarded shrimp and crab shells from aquatic processing plants. These shells contain more than 20% chitin (Hamdi et al. 2017). In China, due to the large number of lakes and the vast sea area, shrimp and crab resources are abundant. With the development of the Chinese seafood and farmed shrimp and crab industries, the amount of these wastes is increasing, causing serious environmental Open Access *Correspondence: wlmqb@126.com; qibin65@126.com † Tiantian Xu and Manting Qi contributed equally to this work and should be considered cofirst authors † Limei Wang and Bin Qi contributed equally to this work 1 Research Center of Fermentation Engineering, Changshu Institute of Technology, Changshu 215500, China Full list of author information is available at the end of the article pollution and increasing the burden on businesses. However, these wastes are important resources.
At present, COS preparation is performed largely via a chemical method. Enzyme-assisted hydrolysis is a better method to obtain COS than chemical methods, with higher purity and lower pollution, but the search for proper enzymes remains ongoing (Liang et al. 2018).
Chitinase and chitosanase are promising enzymes for the production of COS, which has been reported in the literature (Kidibule et al. 2018;Zhang et al. 2018;Guo et al. 2019). The gene encoding chitosanase from Streptomyces albolongus was cloned, sequenced and expressed in Escherichia coli and shown to hydrolyze chitosan to primarily D-GlcN and chitobiose (Guo et al. 2019). To improve industrial chitosanase application, researchers have used carbohydrate-binding module fusion technology to efficiently immobilize GH46 chitosanase (Lin et al. 2019). The stability of the immobilized enzyme is superior to that of the natural enzyme, and three chitosan products with different molecular weights can be produced via the optimized reaction (Lin et al. 2019). However, chitosan has a general limitation as a substrate because it is not completely deacetylated, which results in a nonuniform degree of acetylation of the chitosanase hydrolysate. The single DP and the degree of acetylation of the product components have a strong effect on the identification of the activity of the bioactive components of COS. Two partially acetylated chitotrioses (N-acetylchitotriose and N,N′-diacetylchitotriose) were produced to study the relationship between activity and acetylation. The antioxidant activities of two partially acetylated shell trisaccharides and virgin trisaccharides were further studied. N,N′-diacetylchitotriose, with a high degree of acetylation, has the highest antioxidant activity (Li et al. 2013). Furthermore, COS, with a 50% degree of acetylation, was the most effective at alleviating salt stress in wheat seedlings (Zou et al. 2015). These results indicated that the activity of COS was closely related to its degrees of acetylation and polymerization.
However, despite the presence of the chitinase gene in bacterial and fungal genomes, the ability to hydrolyze insoluble chitin has been identified in only a few species, such as Serratia mucilis and Bacillus, Pseudomonas and Streptomyces species (Hara et al. 2013;Sorokin et al. 2014;Durairaj et al. 2017;Ilangumaran et al. 2017;Moon et al. 2017). Chitinase is a specific hydrolase that directly hydrolyzes chitin to produce fully acetylated COS. Salinivibrio sp. BAO-1801 was isolated from the fermentation broth of salted shrimp, and its chitinase was characterized (Le and Yang 2018). The main product of this enzyme is acetylchitobiose. The binding of different domains to insoluble chitin was studied by NMR spectroscopy (Takashima et al. 2018). The CBM18 domain hydrolyzed insoluble chitooligosaccharide better than the GH19 domain.
Although the chitinases produced by microorganisms have made significant contributions to the transformation of shrimp and crab shell wastes, their molecular and ecological roles in industrial applications have not been fully explored (Nazari et al. 2011). To study the degradation mechanism of enzymes with specific effects produced by microorganisms, whole-genome sequencing technology is widely used. Bifidobacterium choerinum FMB-1, which can degrade resistant starch, was subjected to whole-genome analysis, and 11 protein-coding genes related to α-glucan degradation were found (Jung et al. 2018). Cellulase-producing Paenibacillus lautus BHU3 was subjected to whole-genome analysis, and 143 glycoside hydrolase (GH) genes were discovered. These genes may play a vital role in enhancing cellulolytic attributes (Yadav and Dubey 2018).
Currently, few draft genome sequencing data sets are available for chitinase-producing strains capable of degrading insoluble chitin (Sorokin et al. 2014). Therefore, improving the pool of information via wholegenome sequencing analyses of different strains that produce chitinase will aid in elucidating the genomic basis of chitin decomposition activity and the utilization of degraded chitin. In our previous study, we selected a strain from fermented shrimp paste that could decompose chitin to COS and identified it as Streptomyces diastaticus CS1801 (Xu et al. 2019). In this study, wholegenome sequencing of the chitinase-producing S. diastaticus strain CS1801 was performed to describe specific genomic information regarding chitin degradation activity. The possibility of directly degrading shrimp and crab shells to produce COS was studied.

Determination of the ability of CS1801 to transform COS
The bacteria were isolated from shrimp paste and stored in the China Center for Type Culture Collection (CCTCC, Wuhan, China). The isolate was classified as Streptomyces diastaticus CS1801, and the accession number is CCTCC No. M2018263. Using chitin colloid, GlcNAc or Chinese Eriocheir sinensis shell powder as the sole carbon source, the production of five kinds of COSs (DP = 1-5) was evaluated in the fermentation broth of CS1801 with ultra-high-performance liquid chromatography-mass spectrometry (UPLC-MS, Waters, Massachusetts, United States) after 5 days of fermentation. To prepare the colloidal chitin, flaked chitin was shredded, and 10 g of chitin powder was slowly added to 200 mL of concentrated hydrochloric acid and stirred quickly. After the colloidal chitin was dissolved completely, the impurities were removed by glass cotton filtration, and the solution was added to 1000 mL of distilled water. A precipitate was obtained by centrifugation and washed with distilled water for neutralization. The fermentation medium was prepared as follows: solution A consisted of 1.4 g/L K 2 HPO 4 , 0.6 g/L KH 2 PO 4 , 1 g/L MgSO 4 •7H 2 O, 10 g/L NaCl, and 20 g/L (NH 4 ) 2 SO 4 in 1000 mL of deionized water at pH 6.5, and solution B consisted of 10 g/L sole carbon source at pH 6.5. The two solutions were mixed in equal volumes before use.
Sequencing was performed using an Illumina HiSeq ™ second-generation sequencer (Illumina, Inc., Delaware, United States), and the linker sequences and low-mass bases in the reads were removed using Trimmomatic. Nonamplified long DNA fragments were sequenced using a PacBioRS II third-generation sequencer (PacBio, Pacific Biosciences of California, Inc., Delaware, United States). The linker sequences and low-mass bases in the reads were removed after sequencing and according to the estimated genome size. The data were pooled and analyzed until an estimated 40X coverage of the genome was obtained. Canu was used to splice three generations of single-molecule sequencing data, followed by secondgeneration sequencing data. The scaffold complement GAP was obtained by splicing with GapFiller, and finally, the sequence data were corrected by PrInSeS-G to modify editing errors and the insertion of small fragments during the splicing process.

Homologous gene alignment
The common genes and unique genes of CS1801 and its near-source strains were obtained by the pangenome analysis pipeline (PGAP) for further analysis. The genomic information and gene sequences of the nearsource strains are shown in Table 1. First, PGAP was used to collect the protein sequences of all strains for BLAST alignment. According to the BLAST alignment results, the similarity between different proteins was determined, and similar genes were assigned to same-ortholog clusters. A homologous gene present in all samples was used as a core gene. Then, the shared gene was removed, a nonconsensus gene was obtained, and the specific gene was a gene uniquely possessed only by the tested sample. All nonconsensus genes were combined with a consensus gene as a pangene. A phylogenetic tree was constructed based on the neighbor-joining clustering results from homologous genes and pangenome analysis.

Genome prediction and annotation
Rapid prokaryotic genome annotation (Prokka) was used to predict the assembly results of the gene components, and the obtained genes were submitted to Clusters of Orthologous Groups (COG) of proteins (Tatusov et al. 2000), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases (Kanehisa and Goto 2000) and compared to obtain functional annotation information.

CAZy carbohydrase analysis
The protein sequences inferred from the whole-genome sequence were aligned with the carbohydrate active enzyme (CAZy) database (http://www.cazy.org/) using HMMER3 to obtain the corresponding carbohydrateactive enzyme annotation information (Lombard et al. 2014). The screening condition was E-value < 1e −5 .

Drug resistance functional annotation
The protein sequences inferred from the whole-genome sequence were compared with the Comprehensive Antibiotic Resistance Database (CARD) by BLAST (McArthur et al. 2013), and the annotation information for each gene and its corresponding drug resistance function was combined to obtain the annotation result.

NCBI registration number
The whole-genome sequence data of S. diastaticus CS1801 (6.9 Mb) were deposited in the NCBI database with the accession number SUB5461644.

Complete genome of Streptomyces diastaticus CS1801
Genome-wide analysis was performed to decipher the full set of genes involved in chitin degradation. The complete genome of S. diastaticus CS1801 is composed of two separated circular sequences: one is a 5,611,479-bp chromosome with a 73% GC content, and the other is a 1,388,284-bp plasmid with a 73% GC content (Fig. 1). The genome consists of 7584 protein-coding genes and 90 tRNA and 21 rRNA operons. The whole-genome sequence of Streptomyces diastaticus CS1801 was compared with that of a near-source strain, and the number of homologous genes was counted (Fig. 2a). Based on a comparative analysis of the ubiquitous gene set consisting of genes and nonconsensus genes, the genome has 2898 unique single genes, which is a much higher number than that of closely related strains. Phylogenetic tree analysis showed that S. diastaticus CS1801 is orthologous to related strains and that the strains are derived from a common ancestor (Fig. 2b). A total of 2633 unigenes were annotated in the COG database and were assigned to 14 functional groups (Fig. 3a). Among the groups, transcription (10.22%), amino acid transport and metabolism (8.31%), and carbohydrate transport and metabolism (7.42%) were the most abundant, while the least enriched functional group was RNA processing and modification  (0.03%). The genes in the carbohydrate transport and metabolism functional group are closely related to the degradation of chitin to COS. A total of 9865 unigenes were classified into the biological process category, 6963 unigenes were classified into the cellular component category, and 6656 unigenes were classified into the molecular function category (Fig. 3b). The largest functional groups in the biological process category were metabolic process and cellular process. In the cellular component category, the largest functional groups were cell and cell part, and the largest functional groups in the molecular function category were catalytic activity and binding. A total of 2667 unigenes were annotated in the KEGG database (Fig. 3c), of which 411 unigenes were classified as carbohydrate metabolism genes, 37 unigenes were classified as glycan biosynthesis genes, and 126 unigenes were classified as lipid metabolism genes.
N-Acetylglucosaminidase plays an important role in the degradation of chitin and is broadly distributed in GH3, GH20, GH83 and GH116 (Ferrara et al. 2014). This scenario is also true for CS1801, which harbors 7 genes from GH18 and 2 genes from GH19, including chitinases. Among GTs, the major subcategory was GT87, followed by GT43. Among CBMs, the major category was CBM13, followed by CBM5, CBM12 and CBM32. Among CEs, the major subcategory was CE7, followed by CE10 and CE4. Chitin deacetylase converts chitin to chitosan and is a member of the carbohydrate esterase family 4 (CE4) defined in the CAZy database (Park et al. 2010). In the CS1801 genome, CBM2, CBM5, CBM12, CBM37 and CBM50, which have affinity for chitin, were also detected (http://www.cazy.org/Carbo hydra te-Bindi ng-Modul es.html: Park et al. 2010).

CARD analysis
A total of 63 AR proteins in S. diastaticus CS1801 were predicted, including DNA gyrase subunit B, the daunorubicin doxorubicin resistance ATP-binding protein DrrA, a β-lactamase precursor, virginiamycin B lyase, the quaternary ammonium compound-resistance protein SugE and other antibiotic resistance proteins. In addition, CS1801 has two kinds of resistance proteins, aminoglycoside/hydroxyurea antibiotic resistance kinase (EC 2.7.1.72) and UDP-GlcNAc 1-carboxyvinyltransferase (EC 2.5.1.7), which may be closely related to COS inhibition in the late fermentation stage.

Analysis of enzymes involved in the metabolism of chitin
Both the fermentation products and the whole-genome data of CS1801 aid in elucidating the chitin degradation, metabolism and synthesis mechanisms. The most important phenotypic property of the CS1801 strain is its ability to efficiently hydrolyze chitin and utilize it as a growth substrate. CS1801 contains various genes encoding specific enzymes with chitin hydrolysis activity, such as 9 chitinases, 5 N-acetylglucosaminidases, 10 chitin deacetylases, 4 β-galactosidases, 7 β-glucosidases and 1 chitosanase. In addition, CS1801 contains a variety of enzymes that are thought to be involved in chitin synthesis. Details of the enzymes are listed in Table 4.

Discussion
The results of fermentation revealed that S. diastaticus CS1801 can convert the chitin derived from crab shells, colloidal chitin and N-acetylglucosamine to COS. To explore the mechanism of the chitin degradation process, we performed a genome-wide analysis of the CS1801 strain and found a large number of genes related to enzymatic hydrolysis of COS, the most important of which is chitinase. The predicted chitin degradation and synthesis pathways of CS1801 are proposed in Fig. 4. CS1801 degrades chitin through three predicted pathways as follows: (i) endochitinase hydrolyzes the insoluble form of chitin to water-soluble oligomers, especially (GlcNAc) 2 , and then, (GlcN) 2 is generated from (GlcNAc) 2 by chitin deacetylase. (ii) GlcNAc is ultimately produced from chitin by exochitinase, and GlcNAc is transformed to GlcN by chitin deacetylase. (iii) Chitosan is generated from chitin by chitin deacetylase and then is transformed to COS by chitosanase (Seki et al. 2019). All of the (GlcN) 2 molecules are transformed to GlcN by β-galactosidase or β-glucosidase. β-Galactosidase and β-glucosidase split the β-(1,4)-glycosidic bond of (GlcN) 2 to generate GlcN (Chinchetru et al. 1989;Sorokin et al. 2014). Then, GlcN is transformed to 6-phosphate-GlcN by glucokinase (Sorokin et al. 2014), GlcN-6-phosphate deaminase (EC 3.5.99.6) transforms 6-phosphate-GlcN to 6-phosphatefructose, and finally, CO 2 and H 2 O are formed by the action of the EMP, HMS and TCA cycles. In this unique manner, CS1801 can synthesize chitin. First, GlcN-6-phosphate-fructose aminotransferase (isomerization) transforms 6-phosphate-fructose to 6-phosphate-GlcN (Leriche et al. 1997). Then, 6-phosphate-GlcN is transformed to GlcN-1-phosphate by phosphoglucosamine mutase (Shimazu et al. 2012). N-GlcNAc-1-phosphate is formed by GlcN-1-phosphate N-acetyltransferase, which transforms GlcN-1-phosphate to N-GlcNAc-1-phosphate (Zhang et al. 2015). Next, UDP-N-GlcNAc is formed by UDP-N-GlcNAc pyrophosphorylase, which can transform N-GlcNAc-1-phosphate to UDP-N-GlcNAc  (Ullrich and van Putten 1995). Finally, chitin is produced by poly-β-1,6-N-acetyl-d-GlcN synthase (chitin synthase) (Sburlati and Cabib 1986). CS1801 can hydrolyze colloidal chitin to (GlcN) 2 , (GlcN) 3 , (GlcN) 4 and (GlcN) 5 . It can also synthesize (GlcN) 3 , (GlcN) 4 and (GlcN) 5 from GlcNAc. According to these findings, CS1801 has the complete metabolic pathway for chitin, including hydrolysis and synthesis. Streptomyces coelicolor A3 (2) is a well-studied Streptomyces strain that has a known whole-genome sequence (Bentley et al. 2002) containing 13 chitinase genes (some of which are putative). However, CS1801 has 9 chitinases, 1 chitosanase and 7 potential chitinases (PROKKA00833, 02567, 02568, 05526, 05528, 06351, and 07303) with binding domains associated with chitin degradation. Chitinivibrio alkaliphilus gen. nov., sp. nov. is a novel extremely haloalkaliphilic anaerobe that has fewer than 5 chitinases (Sorokin et al. 2014). CS1801 has more chitinases than other strains and a more complete chitin metabolism pathway. CS1801 can directly convert untreated crab shell waste to COS with a molecular weight less than 1000 Da, which has good commercial application value (Chen et al. 2003(Chen et al. , 2005. To further study the mechanism of CS1801-mediated chitin degradation, the enzymes in the reaction pathway will be cloned and expressed in subsequent work, and directed evolution can be used to control the degree of chitin hydrolysis. Finally, a COS with single polymerization and acetylation can be obtained with improved purity. In general, CS1801 is a very interesting strain, and CS1801 or its enzymes have the potential to produce COSs. This study will lay a good foundation for industrial COS production and increase the added value of waste, such as crab shells.