A GH89 human α-N-acetylglucosaminidase (hNAGLU) homologue from gut microbe Bacteroides thetaiotaomicron capable of hydrolyzing heparosan oligosaccharides

Carbohydrate-Active enZYme (CAZY) GH89 family enzymes catalyze the cleavage of terminal α-N-acetylglucosamine from glycans and glycoconjugates. Although structurally and mechanistically similar to the human lysosomal α-N-acetylglucosaminidase (hNAGLU) in GH89 which is involved in the degradation of heparan sulfate in the lysosome, the reported bacterial GH89 enzymes characterized so far have no or low activity toward α-N-acetylglucosamine-terminated heparosan oligosaccharides, the preferred substrates of hNAGLU. We cloned and expressed several soluble and active recombinant bacterial GH89 enzymes in Escherichia coli. Among these enzymes, a truncated recombinant α-N-acetylglucosaminidase from gut symbiotic bacterium Bacteroides thetaiotaomicron ∆22Bt3590 was found to catalyze the cleavage of the terminal α1–4-linked N-acetylglucosamine (GlcNAc) from a heparosan disaccharide with high efficiency. Heparosan oligosaccharides with lengths up to decasaccharide were also suitable substrates. This bacterial α-N-acetylglucosaminidase could be a useful catalyst for heparan sulfate analysis.

To look for a bacterial homologue of hNAGLU which can catalyze the cleavage of the terminal α-linked Glc-NAc in heparosan oligosaccharides efficiently, in addition to testing the activity of CpGH89 and its loop-truncated mutant, we cloned and examined the activities of three other GH89 enzymes including Bf0576 from Bacteroides fragilis as well as Bt0438 and Bt3590 from Bacteroides thetaiotaomicron. Among these, Bt3590 was shown to be a highly active α-N-acetylglucosaminidase that can catalyze the hydrolysis of terminal GlcNAc from the nonreducing end of heparosan oligosaccharides with varied lengths. It is a promising candidate that can be used for chemoenzymatic sequencing of heparin/HS oligosaccharides (Merry et al. 1999;Turnbull 2001;Turnbull et al. 1999).

Cloning of full-length and truncated α-N-acetylglucosaminidases from B. fragilis, B. thetaiotaomicron, and C. perfringens
The genes encoding the full-length Bf0576 from B. fragilis, Bt0438 and Bt3590 from B. thetaiotaomicron, and CpGH89 from C. perfringens were amplified by polymerase chain reactions (PCRs) from the corresponding genomic DNAs. Genes for truncated proteins ∆17Bf0576 (residues 18-718), ∆24Bt0438 (residues 25-730), and ∆22Bt3590 (residues 23-732) were amplified from the corresponding plasmids containing the full-length genes (see below for cloning) by PCRs. DNA sequence encoding loop (residues 680-686)-truncated CpGH89 (tCpGH89) was amplified from the plasmid containing the full-length CpGH89 (see below for cloning) using Q5 kit. The corresponding primers used are listed in Table 1. PCRs were performed in a 50 μL reaction mixture containing 20 ng of genomic DNA or 4 ng of plasmid as the template DNA, 1 μM each of forward and reverse primers, 5 μL of 10 × Phusion ® HF buffer, 1 mM dNTP mixture, and 5 units (1 μL) of Phusion ® HF DNA polymerase. The reaction mixtures were subjected to 35 cycles of amplifications with an annealing temperature of 55 °C for Bf0576, Bt0438 and Bt3590, 62 °C for CpGH89, 65 °C for tCpGH89. For cloning the full-length genes and the genes encoding the N-terminal truncated recombinant proteins, the resulting PCR products were digested with the corresponding restriction enzymes introduced in the primers, purified, and ligated with predigested pET15b vector. For cloning tCpGH89, the resulting PCR products were purified, and ligated with KLD Enzyme Mix included in the Q5 ® Site-Directed Mutagenesis Kit. The ligated product was transformed into chemically competent E. coli DH5α cells. Positive plasmids were sequenced and subsequently transformed into homemade BL21 (DE3) chemically competent cells. Selected clones were grown for protein expression.

Protein expression, purification, and quantification
The plasmid-bearing E. coli cells were cultured in 1 L Luria-Bertani (LB) rich medium (10 g L −1 tryptone, 5 g L −1 yeast extract, and 10 g L −1 NaCl) supplemented with 100 μg mL −1 ampicillin at 37 °C with shaking. Generally, overexpression of the target protein was achieved by inducing the E. coli culture with 0.1 mM of isopropyl-1-thio-β-D-galactopyranoside (IPTG) at OD 600 nm = 0.8-1.0 and incubating at 20 °C for 20 h with vigorous shaking at 250 rpm in a C25KC incubator shaker (New Brunswick Scientific, Edison, NJ). Cells were collected by centrifugation at 5000×g, 4 °C for 30 min. The cell precipitation was resuspended in Tris-HCl buffer (100 mM, pH 8.0), and then lysed by homogenizer. Cell debris was removed by centrifugation at 8000×g and 4 °C for 30 min, and the enzymes were purified from the supernatant by Bio-Scale Mini Nuvia IMAC Cartridge following the manufacturer's instructions. Eluted fractions were pooled and loaded onto Bio-Scale ™ Mini Bio-Gel ® P-6 Desalting Cartridge to remove imidazole and then redissolved in Na 2 HPO 4 -NaH 2 PO 4 buffer (0.1 M, pH 6.5). The expression of the recombinant proteins was examined by SDS-polyacrylamide gel electrophoresis (SDS-PAGE) performed in 12% Tris-glycine gels, and the protein concentration was determined by NanoDrop Lite spectrophotometer from Fisher Scientific (Tustin, CA, USA).

Enzyme assays of α-N-acetylglucosaminidases using GlcNAcαMU (1) as the substrate
Enzymatic assays (20 μL total reaction volume) were performed in duplicate in Na 2 HPO 4 -NaH 2 PO 4 buffer (0.1 M, pH 6.5) containing GlcNAcαMU (1, 1 mM). An enzyme selected from Δ17Bf0576 (0.49 μM), Δ24Bt0438 (0.048 μM), Δ22Bt3590 (0.012 μM), CpGH89 (0.38 μM), or tCpGH89 (0.38 μM) was added and the reactions were allowed to proceed at 37 °C for 20 min or 20 h and stopped by adding 40 μL methanol. Samples were centrifuged and the supernatants were analyzed at 315 nm by an Agilent ultra-high performance liquid chromatography (UHPLC) system equipped with a membrane on-line degasser, a temperature control unit (set at 30 °C), and a diode array detector using EclipsePlusC18 RRHD column (2.1 × 50 mm I.D., 1.8 μm particle size; Agilent). Mobile phase A was 0.1% trifluoroacetic acid (TFA) in water, and mobile phase B was acetonitrile. The system was preequilibrated with a running mobile phase composed of mobile phase A and mobile phase B (95/5, v/v) at a flow rate of 0.25 mL/min. After injection of the sample, compound separation was carried out with two-phase gradient elution steps (starting at 95% A + 5% B at 0 min to 50% A + 50% B at 4 min, then back to 95% A + 5% B at 5 min with the run stopped at 5.1 min).

Enzyme assays of α-N-acetylglucosaminidases using GlcNAcα1-4GlcAβProNHFmoc (2) as the substrate
Enzymatic assays (20 μL total reaction volume) were performed in duplicate in Na 2 HPO 4 -NaH 2 PO 4 buffer (0.1 M, pH 6.5) containing GlcNAcα1-4GlcAβProNHFmoc (2, 1 mM). An enzyme selected from Δ17Bf0576 (0.15 mM), Δ24Bt0438 (0.036 mM), Δ22Bt3590 (0.003 mM), CpGH89 (0.11 mM), or tCpGH89 (0.11 mM) was added and the reactions were allowed to proceed at 37 °C for 1 h or 20 h and stopped by adding 40 μL methanol. Samples were centrifuged and the supernatants were analyzed at 254 nm by a Shimadzu LC-2010A high-performance liquid chromatography (HPLC) system equipped with a membrane on-line degasser, a temperature control unit (set at 30 °C), and a diode array detector using XBridge ® BEH Amide column (4.6 × 250 mm I.D., 5 μm particle size, Waters) protected with a C18 guard column cartridge. Mobile phase A was 0.1% formic acid in water, and mobile phase B was acetonitrile. The system was pre-equilibrated with running mobile phase composed of mobile phase A and mobile phase B (20/80, v/v) at a flow rate of 0.8 mL/min. After injection of the sample, compound separation was carried out in a four-phase procedure with an isocratic condition of 20% A + 80% B during 0-5 min, a gradient to 45% A + 55% B during 5.0-5.5 min, a gradient back to 20% A + 80% B during 5.5-6.0 min, followed by a 2 min-isocratic condition until the run was stopped at 8 min.

Thermostability assays for Δ22Bt3590
Δ22Bt3590 dissolved in citric acid-sodium citrate buffer (0.1 M, pH 5.0) was incubated at 25, 30, and 37 °C for 1 h, 4 h, 8 h, and 24 h, respectively. After incubation, enzymatic assays (20 μL total reaction volume) were performed in duplicate at 37 °C in a mixture containing disaccharide 2 (1 mM) and incubated Δ22Bt3590 (0.35 μM). Reactions were allowed to proceed for 20 min and stopped by adding 40 μL methanol to each reaction mixture. Samples were centrifuged, and then analyzed by HPLC as described above.

Effects of divalent metal cations, EDTA, and a reducing reagent DTT on the activity of Δ22Bt3590
Enzymatic assays were carried out in duplicate at 37 °C for 20 min in a total volume of 20 μL in citric acid-sodium citrate buffer (0.1 M, pH 5.0) containing disaccharide 2 (1 mM), Δ22Bt3590 (0.39 μM), and 10 mM of CaCl 2 , CuSO 4 , MgCl 2 , MnCl 2 , NiSO 4 , ZnCl 2 , ethylenediaminetetraacetic acid (EDTA), or dithiothreitol (DTT). Reactions without metal ions, EDTA, or DTT were used as controls. The reactions were quenched by adding 40 μL methanol. Samples were centrifuged, and then analyzed by HPLC as described above.

Kinetic studies of Δ22Bt3590
To obtain apparent kinetic parameters with GlcNAcαMU (1) as the substrate, Δ22Bt3590 (containing 0.001 μM) was incubated with various concentrations (0.005, 0.0066, 0.008, 0.01, 0.0125, 0.02, 0.04, 0.1 and 0.2 mM) of GlcNAcαMU (1) in duplicate at 30 °C for 10 min (conversion was controlled to below 25%) in a total volume of 40 μL in citric acid-sodium citrate buffer (0.1 M, pH 5.0). The reactions were quenched by adding 40 μL methanol followed by incubation in an ice bath. Samples were centrifuged and analyzed by UHPLC as described above.
To obtain apparent kinetic parameters with GlcNAcα1-4GlcAβProNHFmoc (2) as the substrate, Δ22Bt3590 (0.086 μM) was incubated with various concentrations (0.05, 0.1, 0. Samples were centrifuged and analyzed by HPLC as described above. The apparent kinetic parameters were obtained by fitting the experimental data (the average values of duplicate assay results) into the Michaelis-Menten equation using Grafit 5.0.

Substrate specificity studies of Δ22Bt3590
All reactions were carried out in duplicate at 30 °C or 37 °C in citric acid-sodium citrate buffer (0.1 M, pH 5.0) containing GlcNAcαMU (1) or one of the heparosan oligosaccharides GlcNAcα1-4GlcAβ1-(4GlcNAcα1-4GlcAβ1-) n ProNHFmoc (n = 0-4) (2-6) (1 mM). Reactions at 37 °C used 33 μg mL −1 Δ22Bt3590 and aliquots of samples were taken at 20 min, 4 h, and 24 h and stopped by adding 40 μL methanol. Reactions at 30 °C used 29 μg mL −1 Δ22Bt3590 for 1 h reactions and 217 μg mL −1 Δ22Bt3590 for 24 h reactions. Reactions were stopped by adding 40 μL methanol, centrifuged, and the supernatants were subjected to UHPLC (for reactions using GlcNAcαMU 1 as the substrate) or HPLC (for reactions using a heparosan disaccharide 2 as the substrate) methods as described above. For samples using a heparosan oligosaccharide selected from 3-6 as the substrate, the UHPLC system is used with an AdvanceBioGlycan column (1.8 μm particle, 2.1 × 150 mm, Agilent Technologies, CA) and monitored at 254 nm. Mobile phase A was 0.1% trifluoroacetic acid (TFA) in water, and mobile phase B was acetonitrile. The system was pre-equilibrated with a running mobile phase composed of mobile phase A and mobile phase B (10/90, v/v) at a flow rate of 0.5 mL/min. After injection of the sample, compound separation was carried out in a three-phase procedure with a gradient starting from 10% A + 90% B at 0 min to 30% A + 70% B at 9 min followed by another gradient back to 10% A + 90% B for the duration of 9-9.5 min, then an isocratic duration till the run was stopped at 12.5 min.

Cloning and expression of bacterial CAZy GH89 α-N-acetylglucosaminidases
Protein structure-based alignment using UCSF Chimera (Pettersen et al. 2004) and structural overlay using PyMOL (Yuan et al. 2016) of CpGH89 (GenBank accession number ABG84150.1) and hNAGLU reveal an extra loop in CpGH89 (residues 680-686) containing a tryptophan (W685) residue which was suggested to be important for the recognition of the GlcNAcα1-4Gal motif of its substrate ( Fig. 1; Ficko-Blean and Boraston 2012). This loop was hypothesized to restrict the type of the substrate that can enter the binding pocket and cause the high substrate selectivity of CpGH89, preventing the binding of heparan sulfate-type substrate that containing a terminal GlcNAc α-linked to a β-D-glucuronic acid or α-L-iduronic acid (Birrane et al. 2019; Ficko-Blean and Boraston 2012). Therefore, a truncated CpGH89 (tCpGH89) with this extra loop deleted was designed and cloned.
To identify potential bacterial hNAGLU homologues that can efficiently use HS as the substrate, protein sequence of hNAGLU (GenBank Accession Number AEE60931.1) was used to search for candidates in gut microbes that are known for their capability of using representation. An extra loop in CpGH89 (residues 680-686) containing W685, the key residue for the recognition of GlcNAcα1-4Gal motif in CpGH89 (Birrane et al. 2019;Ficko-Blean and Boraston 2012), was shown in red in a cartoon presentation. The figures were generated with PyMOL host HS as the major source of nutrients (Cartmell et al. 2017;Martens et al. 2008), Bf0576 from B. fragilis (Gen-Bank Accession Number CAH06355.1) as well as Bt0438 (GenBank Accession Number AAO75545.1) and Bt3590 from B. thetaiotaomicron (GenBank Accession Number AAO78695.1) were identified. Protein sequence alignment of hNAGLU, CpGH89, Bf0576, Bt0438, and Bt3590 using the online server Clustal Omega (https:// www. ebi. ac. uk/ Tools/ msa/ clust alo/) showed that Bf0576, Bt0438, and Bt3590 share 34.6%, 32.8%, and 34.7% protein sequence identity with hNAGLU and 37.4%, 28.7%, and 30% sequence identity with CpGH89, respectively. The models of Bf0576, Bt0438, and Bt3590 generated by online server SWISS-MODEL (https:// swiss model. expasy. org/) were used for further structure-based sequence alignment with hNAGLU (PDB ID: 4XWH) and CpGH89 (PDB ID: 2VCC) using UCSF Chimera (Pettersen et al. 2004). The Trp-containing extra loop presented in CpGH89 could also be found in Bf0576 structural model but is absent from the structural models of Bt0438 and Bt3590 (Fig. 2).

Activity assays of bacterial CAZy GH89 α-N-acetylglucosaminidases
The activities of recombinant bacterial α-Nacetylglucosaminidases were assayed using a commercially available fluorophore-tagged substrate, 4-methylumbelliferyl α-N-acetylglucosaminide (GlcNAcαMU,  (Fig. 4), in a quantitative ultra-high performance liquid chromatography (UHPLC) assay with a diode array detector. As shown in Table 2, all recombinant α-N-acetylglucosaminidases tested were able to catalyze the cleavage of GlcNAcαMU at pH 6.5, with the highest efficiency observed for ∆22Bt3590, followed by ∆24Bt0438 with a medium relative catalytic efficiency. CpGH89, tCpGH89, and ∆17Bf0576 had similar relative catalytic efficiencies with 24.9-25.5% yields in 20 min when 0.38 μM (for CpGH89 or tCpGH89) or 0.49 μM (for ∆17Bf0576) of enzyme was used. In comparison, ∆24Bt0438 had a higher yield of 36.6 ± 1.8% in 20 min when it was used at an eight-ten-fold lower enzyme concentration (0.048 μM). ∆22Bt3590 had the highest efficiency with a yield of 38.7 ± 0.3% when 0.012 μM of enzyme (32-41-fold less) was used. All reactions went to completion when the reaction time was extended to 20 h. Taking advantage of a previously synthesized fluorophore-labeled heparosan disaccharide GlcNAcα1-4GlcAβProNHFmoc (2) (Na et al. 2020), the activities of the recombinant enzymes in catalyzing the cleavage of the terminal α1-4-linked GlcNAc were assayed at pH 6.5. As shown in Table 3, although all enzymes were active and more than 91% of the substrate could be cleaved in 20 h, the concentrations of CpGH89 (0.11 mM), tCpGH89 (0.11 mM), and ∆17Bf0576 (0.15 mM) used were extremely high. When ∆24Bt0438 was used at 0.036 mM which was also a relatively high concentration, low yields of 12.2 ± 0.8% were achieved. In comparison, ∆22Bt3590 was able to catalyze the cleavage quite efficiently. When it was used at 0.003 mM, a concentration that was 12-fold lower than that of the ∆24Bt0438 and 37-50-fold lower than others, yields of 42.9 ± 1.2% were achieved, which were about 3.5-fold higher than that of the ∆24Bt0438. These results indicated that among the five recombinant enzymes, ∆22Bt3590 was the most efficient in catalyzing the cleavage of the terminal α1-4-linked GlcNAc from the heparosan disaccharide GlcNAcα1-4GlcAβProNHFmoc (2) at pH 6.5.

pH profile of ∆22Bt3590 activity
Using GlcNAcα1-4GlcAβProNHFmoc (2) as the substrate, ∆22Bt3590 was further characterized for its pH profile. It preferred an acidic pH and the optimal activity was at pH 5.0 in sodium citrate buffer (Fig. 5A). Its activity decreased dramatically when the pH fell below 4.0 or rose above 6.0.

Effect of divalent metal cations, ethylenediaminetetraacetic acid (EDTA), and dithiothreitol (DTT) on the activity of ∆22Bt3590
The effects of various metal ions, the chelating reagent EDTA, and the reducing reagent DTT on the enzyme activity of ∆22Bt3590 were examined at pH 5.0. Reactions without metal ions were used as controls. As shown in Fig. 5B, a divalent metal cation was not required for the catalytic activity of ∆22Bt3590 as 10 mM of EDTA had no effect. Nevertheless, the presence of 10 mM CuCl 2 decreased the reaction yields of ∆22Bt3590 slightly and the addition of MnCl 2 or ZnCl 2 almost abolished its

Temperature profile studies of ∆22Bt3590
∆22Bt3590 was shown to have optimal activities in the temperature range of 35-40 °C (Fig. 6A) and about 90% of the optimal activity was observed at 45 °C. Its activity decreased dramatically when the temperature reached 50 °C or higher. About 50% of the optimal activity was observed at 30 °C and the activity decreased with the decrease of the temperature. Minimal activity was observed at 10 °C.

Substrate specificity studies of ∆22Bt3590
Using GlcNAcαMU (1) and synthetic α-GlcNActerminated fluorophore-tagged heparosan oligosaccharides of varied lengths (2-6, Fig. 4) (Na et al. 2020) as substrates, substrate specificity studies of ∆22Bt3590 at 37 °C showed that heparosan oligosaccharides with longer lengths were poorer substrates than heparosan disaccharide 2 (Fig. 7A) and the yield of the catalytic reactions, in general, decreased with the increase of the substrate length. In agreement with the thermostability study results, ∆22Bt3590 lost most of its activity after 4 h-incubation at 37 °C as no further yield improvement was observed for the reactions with 24 h incubation compared to those with 4 h incubation time.
When the reaction temperature was decreased to 30 °C where ∆22Bt3590 was more stable (Fig. 6), an incubation time of 24 h was able to improve the reaction yields to reach more than 95% completion for all substrates tested (Fig. 7B).

Discussion
Bacteroides thetaiotaomicron is a Gram-negative gut symbiotic bacterium which is well known for containing a large number of glycoside hydrolases and its capability of using different polysaccharides as nutrients (Cartmell et al. 2017;. The complete 6.26-Mb genome sequence of B. thetaiotaomicron strain VPI-5482 (ATCC#29148) (Comstock and Coyne 2003; was predicted by PULDB database (http:// www. cazy. org/ PULDB/) (Terrapon et al. 2015(Terrapon et al. , 2018 to encode more than 100 glycoside hydrolases responsible for breaking down a wide variety of polysaccharides. Nevertheless, among enzymes in B. thetaiotaomicron that are predicted to be responsible for glycosaminoglycan degradation (Ahn et al. 1998;Hooper et al. 2002;Ndeh et al. 2018Ndeh et al. , 2020, only polysaccharide lyases (PLs) and a GH88 ∆4,5-unsaturated uronyl hydrolase (Bt4658) have been biochemically characterized for using heparin and HS as high-priority nutrient sources by B. thetaiotaomicron (Cartmell et al. 2017;Dong et al. 2012;Han et al. 2009;Luo et al. 2007;Ulaganathan et al. 2017;. Although B. thetaiotaomicron hNAGLU homologues in CAZy GH89 family were predicted to be α-N-acetylglucosaminidases that are involved in HS degradation based on deduced protein sequences from the B. thetaiotaomicron genomic sequence (Comstock and Coyne 2003;Martens et al.   2008), none have been characterized so far. Here we provide evidence that Bt0438 and Bt3590 from B. thetaiotaomicron VPI-5482 (ATCC#29148) as well as Bf0576 from of Bacteroides fragilis NCTC 9343 (ATCC#25285) are α-N-acetylglucosaminidases. While their full-length proteins did not expressed well in E. coli BL21(DE3) in a pET15b vector, N-terminal truncation led to the successful expression of the recombinant proteins ∆17Bf0576 (170 mg/L culture), ∆24Bt0438 (9 mg/L culture), and ∆22Bt3590 (136 mg/L culture) as soluble and active enzymes. Among these three, ∆22Bt3590 was the most efficient in catalyzing the cleavage of the terminal α-GlcNAc from commercially available GlcNAcαMU (1) at pH 6.5. ∆22Bt3590 was also shown to be able to use synthetic heparosan oligosaccharides (2-6) with an α-GlcNAc at the non-reducing end as the substrates.
A W638-containing loop in CpGH89 (Ficko-Blean et al. 2008;Yogalingam et al. 2000) that is absent in hNA-GLU (Birrane et al. 2019) was suggested to be critical for the recognition of the specific GlcNAcα1-4GalβORtype substrate by CpGH89. The presence of this loop introduces an extra tryptophan residue (Trp685) in the substrate-binding pocket of CpGH89 which is absent in hNAGLU (Fig. 8). Such a loop is also present in the structural model of Bf0576 but is absent from the structural models of Bt0438 and Bt3590. The loop-truncated version of CpGH89 (tCpGH89) showed a twofold higher activity in using GlcNAcα1-4GlcAβProNHFmoc (2) as the substrate compared to CpGH89.
While ∆22Bt3590 was the most reactive at 37 °C and pH 5.0 (Fig. 6A), it lost most of its activity after 4 h-incubation under this condition (Fig. 6B). In comparison, while at 30 °C ∆22Bt3590 was performing at 50% of its optimal activity (Fig. 6A), it was more stable and retained 38% activity even after 8 h-incubation under this condition (Fig. 6B). Indeed, ∆22Bt3590 was shown to be able to catalyze almost the complete cleavage of the terminal α-GlcNAc from heparosan disaccharide (2), tetrasaccharide (3), hexasaccharide (4), octasaccharide (5), and decasaccharide (6) at 30 °C within 24 h (Fig. 6B). In comparison, the cleavage of the terminal α-GlcNAc from heparosan oligosaccharides that was tetrasaccharide Fig. 8 Structural overlay of the active sites of hNAGLU (PDB ID:4XWH, golden), CpGH89 (cyan) in complex with GlcNAc (green) (PDB ID: 2VCA), and Bt3590 model (magenta). The GlcNAc ligand (green) and the key residues in the active sites (labeled in black for hNAGLU, cyan for CpGH89, and magenta for Bt3590) are shown in stick representations. The figure was generated with UCSF Chimera (Pettersen et al. 2004)