Codon harmonization reduces amino acid misincorporation in bacterially expressed P. falciparum proteins and improves their immunogenicity

Codon usage frequency influences protein structure and function. The frequency with which codons are used potentially impacts primary, secondary and tertiary protein structure. Poor expression, loss of function, insolubility, or truncation can result from species-specific differences in codon usage. “Codon harmonization” more closely aligns native codon usage frequencies with those of the expression host particularly within putative inter-domain segments where slower rates of translation may play a role in protein folding. Heterologous expression of Plasmodium falciparum genes in Escherichia coli has been a challenge due to their AT-rich codon bias and the highly repetitive DNA sequences. Here, codon harmonization was applied to the malarial antigen, CelTOS (Cell-traversal protein for ookinetes and sporozoites). CelTOS is a highly conserved P. falciparum protein involved in cellular traversal through mosquito and vertebrate host cells. It reversibly refolds after thermal denaturation making it a desirable malarial vaccine candidate. Protein expressed in E. coli from a codon harmonized sequence of P. falciparum CelTOS (CH-PfCelTOS) was compared with protein expressed from the native codon sequence (N-PfCelTOS) to assess the impact of codon usage on protein expression levels, solubility, yield, stability, structural integrity, recognition with CelTOS-specific mAbs and immunogenicity in mice. While the translated proteins were expected to be identical, the translated products produced from the codon-harmonized sequence differed in helical content and showed a smaller distribution of polypeptides in mass spectra indicating lower heterogeneity of the codon harmonized version and fewer amino acid misincorporations. Substitutions of hydrophobic-to-hydrophobic amino acid were observed more commonly than any other. CH-PfCelTOS induced significantly higher antibody levels compared with N-PfCelTOS; however, no significant differences in either IFN-γ or IL-4 cellular responses were detected between the two antigens.


Introduction
Escherichia coli expression systems have been widely used for the expression and manufacturing of various malarial antigens owing to their ease of use and advantages in cost and scale despite protein expression and folding obstacles. Common causes cited for poor expression of recombinant genes in heterologous hosts are the species-specific disparities in codon usage. Codon usage frequencies can potentially impact a protein's function, solubility, and length (Khan et al. 2012).
In E. coli, protein folding can occur "co-translationally" at the ribosome (Komar 2009;Kramer et al. 2009;Nissley et al. 2016) and variable translation rates are thought to affect tertiary structure (Nissley et al. 2016). A recent study using genome-wide analysis provided evidence of evolutionary selection for co-translational folding, underscoring the importance of translation kinetics in Open Access *Correspondence: evelina.angov.civ@mail.mil 1 Malaria Biologics Branch, Walter Reed Army Institute of Research, Silver Spring, MD 20910, USA Full list of author information is available at the end of the article protein folding (Jacobs and Shakhnovich 2017). More frequently used codons are often found in well-ordered structural elements such as alpha helices, while low usage frequency codons often occur within link/end segments (Thanaraj and Argos 1996). These observations suggest that codon usage frequency plays an inherent role in cotranslational folding.
Based on these concepts, we developed a strategy to "recode" target gene sequences for heterologous expression by substituting native codons with synonymous alternates with identical or similar usage frequencies in the expression host. This approach has been termed "codon harmonization", and applies a two-pronged approach. First, "best fit" codon usage frequency of the native gene is applied to that of the heterologous host. Second, putative link/end segments are identified and recoded to re-establish regions benefitted by slower translation (Angov et al. 2008).
We previously reported single base changes (i.e., FMP003 protein) for synonymous codon replacement can increase soluble protein yields by a factor of approximately ten, compared with native sequence yields (Angov et al. 2008). This protein was produced under cGMP conditions resulting in a highly immunogenic and efficacious product tested against malaria challenge in an Aotus monkey study (Darko et al. 2005). "Harmonizing" all of the codons throughout the gene sequence (i.e., FMP010 protein) yielded an additional 60-fold increase in expression level (Angov et al. 2008). Furthermore, application of this approach to alternative alleles of MSP1 42 protein, yielded similar high levels of expression. The successes achieved are notable, because some native P. falciparum gene sequences expressed in E. coli yield no or low yields of recombinant protein (Angov et al. 2008). Here we applied codon harmonization to improve protein expression, yield, and quality of a novel malaria vaccine candidate, PfCelTOS.
"Cell-traversal protein for ookinetes and sporozoites" (CelTOS) is an essential protein in malaria parasites that is required for cell traversal in both mammalian and insect hosts (Kariu et al. 2006). In mice, recombinant PfCelTOS in Montanide ISA 720 elicited potent humoral and cellular immune responses as well as sterile protection against heterologous challenge with Plasmodium berghei sporozoites (Bergmann-Leitner et al. 2010). This was corroborated using an alternative recombinant PfCelTOS in glucopyranosyl lipid adjuvant-stable emulsion (GLA-SE) or glucopyranosyl lipid adjuvant-liposome-QS21 (GLA-LSQ) adjuvant (Espinosa et al. 2017). In addition, monoclonal antibodies raised against PfCel-TOS inhibited oocyst development of P. falciparum, and P. berghei expressing PfCelTOS, in Anopheles gambiae mosquitoes. Notwithstanding these findings, CelTOS is an attractive target for immunization as it is conserved across plasmodial species (Kariu et al. 2006).
We developed a recombinant protein vaccine candidate based on PfCelTOS in E. coli. On characterization of the protein product, we observed that the PfCelTOS reversibly refolds after thermal denaturation; this property may be valuable for cold-chain storage or use in temperate climates. To address the impact of codon harmonization on primary, secondary, and tertiary structure, recombinant protein was produced using the native gene sequence (N-PfCelTOS) and compared with protein produced from the codon-harmonized sequence (CH-PfCelTOS). We utilized circular dichroism (CD), mass spectrometry (MS), and size-exclusion chromatography to characterize the two proteins and identified differences in mass and heterogeneity. A deeper analysis by liquid chromatography-tandem mass spectrometry (LC-MS/MS) revealed amino acid misincorporations in both proteins. Interestingly, despite these changes in amino acid sequence, both proteins reversibly refolded after heat denaturation despite detectable changes in primary and secondary structure. The potential impact of these changes as immunogens also was evaluated in vivo in Balb/cJ mice. Antibody fine specificities were assessed against full length PfCelTOS or subunit fragments reflecting the N-terminus or C-terminus to better define any differences. CH-PfCelTOS induced significantly higher antibody levels compared with N-PfCelTOS; however, no significant differences in cellular responses were detected.

Sequences
Proteins were produced in E. coli using either the native or codon harmonized DNA sequences. Gene inserts were synthesized and cloned into the pET(K) expression plasmids (DNA 2.0, currently ATUM, Newark, CA) and transformed into B834 (DE3) E. coli. Both native and codon harmonized sequences (GenBank Accession # KH833194) encoded the same 174 amino acids and included a 16 amino acid linker containing an N-terminal 6-Histidine tag (Bergmann-Leitner et al. 2010). A Histidine tag-free PfCelTOS was expressed in E. coli as above, and similar to the N-and CH-PfCelTOS proteins were expressed without the native PfCelTOS signal sequence. Primer pairs used to generate the Histidine tag free PfCelTOS clone from the CH-PfCelTOS (N-terminal Histidine tagged protein) used XbaI and KpnI to replace the nucleotide sequences using XbaI-NdeI KpnI 5′-CTA GAA ATA ATT TTG TTT AAC TTT AAG AAG GAG ATA TAC ATA TGG GTA C-3′ and KpnI NdeI-XbaI 5′-CCA TAT GTA TAT CTC CTT CTT AAA GTT AAA CAA AAT TATTT-3′ annealed primers. His-tag free PfCelTOS was 161 amino acids long, including two non-native amino acid residues introduced by cloning, GT. All full length clones were initiated at amino acids F R G… and contained 158 of PfCelTOS amino acids. An N-terminal protein fragment of PfCelTOS (natural residues numbering #25-149) and C-terminal fragment of PfCelTOS (residues #85-182) were expressed under identical conditions as for the full length PfCelTOS and used as a reagents to assess fine specificities of immune responses.

Expression of N-PfCelTOS and CH-PfCelTOS
To investigate the effect of codon usage on PfCelTOS expression, cultures were grown in the presence of 40 µg/mL kanamycin (Sigma Aldrich, St. Louis, MO) in 1 L Difco Terrific Broth (BD Biosciences, San Jose, CA) at 30 °C. Cells were induced by adding 0.1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) (Sigma Aldrich) at an OD 600 ~ 0.8-1.0 for protein induction. Cell samples were collected every hour from the time of induction for 3 consecutive hours for analysis by SDS-PAGE (Invitrogen, Waltham, MA). Subunit fragments representing the N-terminus and C-terminus of PfCelTOS essentially were expressed using the same conditions as for full length PfCelTOS.

Solubility of N-PfCelTOS and CH-PfCelTOS
Cell paste (3 g) was homogenized (Ultra Turrax T-25, Cole Palmer, Vernon Hills, IL) in 60 mL of lysis buffer (PBS; pH 7.4) (Quality Biological, Gaithersburg, MD). Cells were subjected to microfluidization (Microfluidics Corporation, Model M-110 Y, Westwood, MA) and the cell lysates were divided equally into four parts (by volume). Each part was treated with 1% Tween 80 (v/v) (Fisher Scientific), 1% deoxycholate (v/v) (Fisher Scientific, Rockville, MD) or 1% sarkosyl (v/v) (Sigma-Aldrich, St. Louis, MO) and one part was left "untreated". Detergent extractions were carried out at 30 °C for 1 h in an incubator shaker at ~ 50 rpm. Treated lysates were centrifuged at 12,000 rpm for 1 h at 4 °C to separate soluble supernatants and insoluble pellet fractions. Samples from each fraction were prepared for SDS-PAGE/Coomassie Blue staining (Bio-Rad, Philadelphia, PA).

Purification of N-PfCelTOS and CH-PfCelTOS
Cell paste (4 g) for both clones (N-PfCelTOS and CH-PfCelTOS) was homogenized in 60 mL of lysis buffer (10 mM NaH 2 PO 4 , 50 mM NaCl, 10 mM imidazole, 2 mM MgCl 2 , pH 7.4) and lysed by microfluidization. To adjust the final salt concentration, 5 M NaCl was added to each lysate. Protein was extracted at 30 °C for 30 min by addition of 1% (v/v) sarkosyl. Extracted lysates were centrifuged at 12,000 rpm at 4 °C for 1 h to isolate soluble proteins.

Stability at different temperatures
Purified N-PfCelTOS and CH-PfCelTOS were subjected to stability analysis at different temperatures. A 1 μg/10 μL aliquot of each protein was incubated at either 37 °C or 65 °C for 1, 4 and 24 h. Samples were analyzed by SDS-PAGE/Coomassie Blue staining and western blotting.

Membrane lipid strip assay
Membrane lipid strips (Echelon Biosciences Inc., Salt Lake City, UT) were blocked in blocking buffer (pH 7.2) 1× PBS, 0.1% Tween 20, 3% bovine serum albumin (BSA) for 1 h. The strips were probed with 2 µg/mL of N-and CH-PfCelTOS for 1 h. The proteins were pre-treated at either 37 °C or 65 °C for 1, 4 and 24 h. The strips were washed three times with 5 mL of wash buffer (1× PBS; 0.1% Tween 20) at 5 min intervals and then probed with PfCelTOS-specific rabbit polyclonal serum diluted in blocking buffer (1:5000) for 1 h as the primary antibody and anti-mouse HRP conjugate (1:5000) (KPL, Gaithersburg, MD) as the secondary antibody. The strips were developed with Pierce ECL Western Blotting detection kit (Thermo Scientific, Rockford, IL) for 1 min and imaged using VersaDoc (Bio-rad, Hercules, CA). Temperature-untreated proteins (T0) were used as the binding controls for the membrane lipid strip assay.

Circular dichroism spectroscopy
Thermal denaturation was monitored (2 °C per minute) from 20 to 95 °C using a Jasco 810 circular dichroism spectropolarimeter (Jasco Inc., Japan) fitted with a Peltier temperature control unit. The melting temperature was determined from a four-parameter fit of the ellipticity at 220 nm. A protein concentration of 13 μM (N-PfCelTOS) and 10 μM (CH-PfCelTOS) and a 1 mm cuvette was used for CD analysis. Machine units were converted to molar ellipticity to account for differences in protein concentrations.

Mass spectrometry and peptide analysis
Proteins in 200 mM ammonium bicarbonate were directly injected into a triple-TOF 5600 high resolution mass spectrometer (Sciex, Foster City, CA), full protein TOF spectra was acquired using Analyst TF software (Version 1.7, Sciex, Foster City, CA). Spectra were analyzed and overlaid using Peakview software (Sciex, Foster City, CA). Molecular weights were calculated from spectra using a BioToolKit application for Peakview software. Calculated MW = 19.027.03 g/mol, pI = 5.15, and the ɛ = 9970 M −1 cm −1 for each protein, N-PfCelTOS and CH-PfCelTOS.

LC-MS/MS Trypsin digest/peptide extraction
Proteins were run on 4-20% Tris-glycine Invitrogen gels. Protein bands were cut from the gel and placed in individual 1.5 µL tubes with 300 µL 50% acetonitrile (ACN) in 25 mM ammonium bicarbonate until fully destained followed by alkylation in 30 µL 50 mM iodoacetamide for 30 min at RT. Bands were washed with 500 µL of 100 mM ammonium bicarbonate for 10 min and dehydrated in 600 µL 100% ACN followed by drying for 3 min in a SpeedVac. Gel bands were treated with 8 µL of 1 µg/µL trypsin and 292 µL of 50 mM ammonium bicarbonate. Samples were incubated at 37 °C for 15-18 h, at 450 rpm. Reactions were stopped by addition of 2 µL of 5% formic acid followed by addition of 100 µL HPLC-grade water. Samples were allowed to incubate at RT for 10 min followed by centrifugation for 5 min at 13,000 rpm. Supernatant was aliquoted in labeled tubes containing extraction solution (5% of 50% ACN, 5% formic acid). Peptides were extracted by adding 400 µL of extraction solution, vortexing and allowing to sit for 15 min followed by centrifuging for 5 min; extraction was repeated three times.

Size-exclusion chromatography
PBS and the gel-filtration column calibration standards were purchased from Sigma. PD-10 and Superdex G-200 columns were purchased from G.E. Healthcare. A Superdex G-200 column was equilibrated with PBS, pH 7.4. The column was first calibrated using three protein standards and a 0.5 mL loading loop (0.5 mL/min, 4 °C). A calibration curve was made by plotting the calculated MW vs. the elution volume corresponding to the peak max. The data was fit to a single exponential decay equation using Grafit 5.0.13 (Erithacus Software Limited). Each protein sample was run in the same fashion. Three proteins were loaded: (1) N-PfCelTOS; (2) CH-PfCel-TOS; (3) tag-free CH-PfCelTOS.

N-PfCelTOS and CH-PfCelTOS mouse immunogenicity
Six to seven week-old female Balb/cJ mice (The Jackson Laboratories, Sacramento, CA) were purchased and housed under pathogen-free conditions. Ten mice per group were immunized three times on a 3 week interval by the intramuscular route in the thigh muscle with 10 µg N-PfCelTOS/Montanide ISA-720 (Seppic Inc. New Jersey, NY) or 10 µg CH-PfCelTOS/Montanide ISA-720 in 100 µL; 50 µL per side. Blood samples were collected before every immunization for evaluating humoral responses. Two weeks after the third immunization splenocytes were collected for evaluating cellular responses. An adjuvant control group, mice vaccinated with ISA 720 were shared with a concurrent study.

ELISA
Blood samples were collected from lateral tail veins prior to every immunization. PfCelTOS-specific antibodies were analyzed by enzyme-linked immunosorbent assay (ELISA). Briefly, 2HB Immulon plates (Thermo Scientific, Rochester, NY) were coated with 100 µL/well of codonharmonized of each 25 ng PfCelTOS, 15 ng N-terminal PfCelTOS or 15 ng C-terminal PfCelTOS in PBS, pH7.4 (Quality Biological, Gaithersburg, MD) and incubated overnight at 4 °C in a humidified chamber. After blocking with PBS, 1% BSA at 22 °C (VWR, Chicago, IL) for 1 h, individual samples prepared at single dilutions were added to the plate. Antibody concentration was determined by establishing a standard curve (run in parallel with each assay) with purified mouse IgG. For each serum tested, we determined a concentration that was within the linear portion of the reaction curve and used this dilution to extrapolate the actual antibody concentration in the assay wells. A mouse-IgG (Invitrogen, Rochester, NY) standard curve was run in tandem. Plates were incubated for 2 h at 37 °C in a humidity chamber followed by addition of 100 µL/well AP-conjugated anti-mouse (Promega, Madison, WI). The plates were incubated at 22 °C in a humidity chamber for 1 h followed by addition of Blue Phos substrate (Sera Care, KPL, Gaithersburg, MD). Development was arrested by addition of 2× AP Stop solution (Sera Care, KPL) after 15 min. The plates were read at an absorbance of 630 nm on SpectraMax M2 (Molecular Devices, Downingtown, PA). The concentration of PfCelTOS-specific (full-length, N or C terminal) antibodies (µg/mL) was calculated from the linear portion of the mouse-IgG standard curve.

Statistical analysis
Statistical significance of serological and cellular responses where p < 0.05 is considered significant, were evaluated using parametric two-tailed, unpaired T-tests and multiple T-tests, respectively (GraphPad Prism, v 6.07, San Diego, CA). For multiple T-tests, statistical significance was determined using the Holm-Sidak method.

Features of the expressed proteins
Previously, we observed a significant increase in the expression of MSP1 42 -FVO proteins from codon harmonized sequences (Khan et al. 2012). Protein expressed from either a native or a codon harmonized gene sequence of PfCelTOS expressed well in E. coli (data not shown). However, the average yield using an identical purification process for N-PfCelTOS and CH-PfCelTOS was 1.35 and 2.57 mg/g of wet cell paste, respectively, approximately twofold higher than for the CH sequence. Protein solubility is an important property of recombinant proteins and is dependent on the pH and pI of a protein, ionic strength, temperature, the presence of various solvent additives, and the amino acids on the protein surface. Based on amino acid sequence, PfCelTOS is predicted to be highly soluble. PRO-Sol analysis predicted a scaled solubility value of 0.864 (pI = 5.21) which is higher than the population average of soluble E. coli proteins with a scaled solubility value of 0.45 (Hebditch et al. 2017). Experimentally, both PfCelTOS proteins partitioned into the soluble phase after lysis in phosphate buffered saline (PBS), pH 7.4 under all detergent treatment conditions (1% v/v each, Tween 80, β-octylglucopyranoside, sarkosyl), as well as in the absence of detergent (data not shown).
To address differences in protein stability, PfCel-TOS proteins were incubated at A: 37 °C and B: 65 °C, for 1, 4 and up to 24 h ( Fig. 1). High molecular weight aggregates were observed for both proteins during the extended incubations at 37 °C when analyzed by SDS-PAGE/Western blotting (Fig. 1). Notably, this was not observed for the same proteins stored for 24 h at 65 °C, a temperature near the mid-point of the thermal denaturation curve. When probed with highly sensitive anti-His antibodies, N-PfCelTOS exhibited a doublet (Fig. 1b). The weak upper band was not detected by the C-terminal PfCelTOS mAb, 3D11.D4 (Fig. 1c), and was present at nominally low levels since this band was not detected by total protein staining (Fig. 1a). These results suggest that an alternative form of the protein with a slightly higher molecular weight was produced from the native sequence. This upper band was not present in the CH-PfCelTOS protein. A band (~ 40 kDa) detected by western blotting in both protein preparations suggests a dimeric form of the protein that may be more resistant to denaturation. These data suggest a 'stickiness' that allows for some non-covalent stabilization of multimer forms. The detection of this band by both its N-terminal Histidine tag and C-terminal epitope recognizing 3D11.D4 monoclonal antibody verified that it is a form of PfCel-TOS and not an E. coli contaminant.
To assess the specific-binding characteristics of the two CelTOS proteins to cell membrane phospholipids, lipid subset spotted-arrays were evaluated. Both proteins bound to phosphatidylinositol (4,5)-diphosphate [PtdIns(4,5)P2] and phosphatidylinositol (3,4,5)-triphosphate [PtdIns(3,4,5)P3)], phospholipids residing in the plasma membrane, and to phosphatidic acid, which is present within the inner leaflet of the plasma membrane. Incubation at 37 °C for 24 h yielded significant loss of binding of N-PfCelTOS to phosphatidic acid and PtdIns(4,5)P2 compared to T = 0 (Fig. 1d) while incubation at 65 °C for 24 h completely abrogated lipid binding for both proteins (data not shown). In contrast, CH-PfCelTOS retained the same lipid binding characteristics as at T = 0 for 24 h at 37 °C and showed no significant functional loss in phospholipid binding. Differences in lipid binding characteristics observed at 37 °C for T = 24 h for N-and CH-PfCelTOS may be attributed to the differences in their alpha helical content, the types of amino-acid misincorporations, and their propensities to aggregate after unfolding (Fig. 1d). This suggests that the CH-protein is less prone to irreversibly aggregate in solution than the N-protein as its binding site is still accessible to phospholipids at 37 °C. However, at 65 °C, a temperature near the melting temperature of the proteins, no phospholipid binding was observed for either protein at the 24 h time point (data not shown). These observations were corroborated by the western blot analysis in that at 65 °C and 24 h, the N-protein showed some level of aggregates, whereas the CH-protein had no dimers or high molecular weight multimeric aggregates (Fig. 1c).

Protein sequence and structure analysis
Protein secondary structure was examined using CD spectroscopy. The structure of CelTOS from Plasmodium vivax (PvCelTOS) shows that the protein is predominantly helical. CD scans showed that the N-PfCelTOS protein had less alpha helical content than the CH-PfCelTOS. The minima at 222 nm for the N-PfCelTOS was 82% of that of CH-PfCelTOS and correlates with the alpha helical content of the proteins (Fig. 2a). If the proteins produced from the N and CH sequences were identical no difference in alpha helical content would be expected; thus, the result is consistent with amino acid misincorporations and differences in protein primary structure. Nonetheless, despite differences in alpha helical content both proteins reversibly refolded after thermal denaturation (Fig. 2b). In contrast, a recombinant CelTOS derived from the P. berghei (PbCelTOS) sequence showed evidence of irreversible thermal denaturation (Bergmann-Leitner et al. 2011), suggesting that only some variants of the highly conserved CelTOS protein may retain this feature and refold. For PbCelTOS it is unknown as to how many amino acid misincorporations occur during expression or whether amino acid misincorporations affect its fold.
Mass spectrometry techniques were applied to identify and quantitate the putative amino acid misincorporations in the recombinant CelTOS protein products. CH-PfCelTOS had an average mass of 19,000 Da that compared well with the calculated MW of 19,027 g/mol. In contrast, the N-PfCelTOS had an average mass of 19,140 Da and a broader mass/charge envelop suggesting greater heterogeneity and higher number of amino acid misincorporations compared with CH-PfCelTOS (Fig. 3). This difference in the distribution of protein masses may account for the differences in secondary structure detected by CD spectroscopy. These findings indicate that the two proteins are indeed structurally different. To resolve the differences in mass detected by MS, we applied high resolution LC-MS/MS. Interestingly, Fig. 2 CH-PfCelTOS contains more helical content than N-PfCelTOS. a CD-spectra of N-PfCelTOS (blue) or CH-PfCelTOS (cyan) proteins; overlaid with the CD spectra of the refolded N-PfCelTOS (red) or refolded CH-PfCelTOS (pink) proteins. b Thermal denaturation curves of the N-PfCelTOS (blue) and CH-PfCelTOS (cyan) and the reversed N-PfCelTOS (dashed blue) and reversed CH-PfCelTOS (dashed cyan). Curves are significantly overlaid amino-acid misincorporations were found in both proteins. However, non-synonymous misincorporations seen in the N-PfCelTOS were more varied and numerable compared with CH-PfCelTOS (Table 1). As a general observation, the most common substitutions were hydrophobic to hydrophobic followed by hydrophobic to positively charged amino acids. Hydrophobic to negatively charged amino acid substitutions or substitutions of positively or negatively charged amino acids to neutral ones were rarely observed (Table 2).
To determine if codon usage affected quaternary structure, a calibrated gel-filtration column was used. The column was calibrated using three standard proteins (66, 29, 12.4 kDa) and blue dextran to estimate the void volume. The data were fit to a single exponential decay equation (A = A 0 * e −kr ) where v = elution volume of the protein, Fig. 3 Electrospray mass spectrometry. Mass spectra of N-PfCelTOS (red) and CH-PfCelTOS (blue) overlaid with respective replicates (burgundy and cyan) (inset: zoomed in on 21+ charged ions) . From a fit to our data, we obtained an A 0 = 38,342 and k = 3.6245 and were able to calculate the MW of the PfCelTOS proteins produced using the native or codon-harmonized sequence. For this analysis, we compared three proteins:

Table 1 Nonsynonymous amino acid substitutions observed by LC-MS/MS for N-PfCelTOS and CH-PfCelTOS
(1) N-PfCelTOS; (2) CH-PfCelTOS; and (3) a histidine tag-free version of the CH-PfCelTOS, all produced in E. coli. The N-PfCelTOS eluted as a 59 kDa (14.3 mL) protein while the CH-PfCelTOS eluted at 70 kDa (13.9 mL). The histidine tag-free CH-PfCelTOS eluted at 73 kDa (13.8 mL). Notably, in the histidine tag-free CH-PfCelTOS chromatogram, a second, relatively broad peak eluted between 12 and 13 mL, correlating with a MW range of ~ 104-166 kDa. This peak may correspond to a histidine tag-free hexamer (104 kDa) or octamer (139 kDa). The ~ 70-73 kDa peaks overlaid well with the 66 kDa standard and may indicate a tetramer (70-76 kDa) depending upon the relative globularity between the multimer and standards (Fig. 4). N-PfCelTOS eluted near where a trimer was expected (57 kDa). Notably, monomers were not observed in either sample. Interestingly, the quaternary structure of a C-terminal His-tagged P. vivax CelTOS (PvCelTOS) was previously reported to be a homodimer based upon analytical ultracentrifugation data (Jimah et al. 2016). However, three molecules of the protein are found within the asymmetric unit of the crystal, and symmetry-related molecules in the crystal lattice pack as hexamers with no density for the N-terminal residues prior to Ser-46 or for the C-terminal residues beyond Tyr-175 (PDB 5TSZ) (PvCelTOS amino acids 36-196) suggesting that they are either disordered or absent (i.e., degraded or proteolyzed) (Jimah et al. 2016) (Fig. 5). Thus, other quaternary structures cannot be excluded. Since epitopes can form at interfacial locations in multimeric proteins, we next evaluated immunogenicity in mice.

Immunity in mice
To evaluate immune responses induced by N-PfCelTOS and CH-PfCelTOS, inbred Balb/cJ mice were immunized either with 10 µg of N-PfCelTOS or CH-PfCelTOS in Montanide ISA 720 (n = 10/group). Antibody concentrations were determined using a standard ELISA. Serum antibodies were tested against full-length CH-PfCelTOS, N-terminal CH-PfCelTOS and C-terminal CH-PfCelTOS recombinant proteins to assess antibody fine specificities. Both N-PfCelTOS and CH-PfCelTOS immunogens induced robust antibody responses after three immunizations. Antibodies to the full-length and the C-terminal portion of PfCelTOS were detected in high proportions, and statistically higher for the CH-PfCelTOS immunogen (Unpaired t-test, p = 0.0214 and p = 0.0393, respectively), while N-terminal PfCelTOS recognition was marginal for both groups (Fig. 6b). For cellular responses, the frequencies of IL-4 and IFN-γ-producing splenocytes were measured by ELISpot (Bergmann-Leitner et al. 2010). PfCelTOS-specific IFN-γ secreting splenocytes were detected in high numbers, indicating a Th1 skew for both groups (Fig. 6c), with significant ex vivo stimulation using the C-terminal PfCelTOS and PfCelTOS peptide pool compared to the full-length or N-terminal CH-PfCelTOS proteins. Differences in stimulation indices for the shorter C-terminal PfCelTOS protein and the peptide pool compared with the full length PfCelTOS may be partially explained by better processing and improving antigen presentation and T-cell activation. The lack of stimulation with the full length protein for detection of IFN-γ secreting splenocytes was nominally unexpected (Bergmann-Leitner et al. 2010) and may reflect the quality of the stimulating antigen following multiple freeze thaw cycles.

Discussion
Single synonymous codon substitutions within a coding region can significantly alter substrate specificities (Kimchi-Sarfaty et al. 2007) or enzymatic activities (Komar et al. 1999) of proteins demonstrating that even subtle changes can have a significant impact on final protein structure/function. These subtle differences in coding sequence may serve to regulate protein folding kinetics. While we and others have observed that codon substitutions can alter protein expression, primary, secondary, and quaternary structure, the impact of these changes has not been fully assessed for vaccine immunogens. For example, altered peptide ligands have been intentionally used to modulate T-cell responses to immunotoxin therapeutics (Candia et al. 2016;Castelletti and Colombatti 2005). The so-called Hoskins effect (also known as "original antigenic sin") refers to a phenomenon where prior exposure to an antigen can alter immune responses to a second similar antigen and reduce an immune response. Thus, we hypothesized that variability in amino acid sequence could potentially impact immune responses. Here, PfCelTOS was "codon harmonized" to optimize its expression. It was then systematically compared with protein expressed from the native DNA sequence to evaluate the impact of codon usage on the integrity of the final product. Our findings suggest that the two proteins are biophysically different in primary, secondary, and quaternary structure.
In comparing N-PfCelTOS and CH-PfCelTOS, we expected to find differences in expression levels, Fig. 6 N-PfCelTOS and CH-PfCelTOS induce comparable antibody and TH1 biased cellular responses. a Kinetic of PfCelTOS specific antibody responses, reported as antigen specific concentrations (µg/mL). Geometric means and 95% confidence interval are reported. b Scatter plot of individual mouse responses at 2 weeks post 3rd immunization reported as the mean and standard deviation. Statistical significance of differences in antibody responses induced by N-PfCelTOS and CH-PfCelTOS was determined using Parametric, two tailed, unpaired T-test (**p = 0.0214, *p = 0.00393 against PfCelTOS and C-terminus of PfCelTOS proteins, respectively). Splenocytes from individual mice were tested for secretion of IL-4 and IFN-γ cytokines by ELISpot. Data are reported as the mean plus standard error of the mean of cytokine producing cells (spot forming cells (SFC)/10 6 splenocytes) of individual mice (c) IL-4 and (d) IFN-γ solubility, and yield for the two products as previously observed for MSP1 42 (Khan et al. 2012). While both proteins expressed well in E. coli and were highly soluble, we found them to be different in mass, homogeneity (number of misincorporation events), and secondary structure. Differences in quaternary structure also were detectable in size exclusion chromatograms. Notably, both PfCelTOS proteins reversibly refolded after thermal denaturation; thus, any differences in secondary structure did not significantly affect reversible refolding.
We found that the codon-harmonized protein mass was more similar to the theoretical molecular weight, compared with the native protein which exhibited a greater mass. The broader distribution of masses in mass spectra suggested that the protein produced from the native DNA sequence was more heterogeneous. An attempt to identify the cause of these differences in mass revealed a significantly higher number of amino acid misincorporations in protein expressed from the native DNA sequence. Although amino acid misincorporation is reported to occur at low levels in E. coli, it is dependent upon many factors including the expression system and cell lines (Bouadloun et al. 1983;Edelmann and Gallant 1977;Ellis and Gallant 1982;Kramer and Farabaugh 2007;Laughrea et al. 1987;Loftfield 1963;Loftfield and Vanderjagt 1972;Manickam et al. 2014;Stansfield et al. 1998;Yu et al. 2009;Zhang et al. 2013). Generally, it is estimated that anywhere between 10 and 50% of misincorporations affect protein function Wilke 2008, 2009). These sequence variations can cause protein heterogeneity and altered catalytic activity (Kramer and Farabaugh 2007;Stansfield et al. 1998), disrupt ligand and substrate binding and affect protein folding, leading to aggregation (Drummond and Wilke 2008;Lee et al. 2006). In some cases, high levels of sequence variants can cause undesired immune responses Wilke 2008, 2009;Katsara et al. 2008). Interestingly, misincorporation at rarely used codons does not occur more frequently than at commonly used codons, rather misincorporation errors may be context-dependent (Parker 1992). In the current study, we observed a larger number of sequence variations in N-PfCelTOS than in the CH-PfCelTOS. The most common substitutions being hydrophobic to hydrophobic amino acids, and hydrophobic to positively-charged amino acids. Interestingly, hydrophobic to negatively-charged amino acid substitutions and positively-charged to neutral substitutions were not observed. These substitutions likely account for the differences in alpha helical content and suggest that codon usage impacts the misincorporation of amino acids. A factor not extensively discussed is the influence of the Histidine-tag and its location on protein tertiary and quaternary structure, as well as its influence on immunogenicity. A codon-optimized, C-terminal His-tagged PvCelTOS (P. vivax CelTOS) was previously shown to form a homodimer (Jimah et al. 2016); our results indicate that the N-terminal His-tagged or a tagfree, codon harmonized PfCelTOS appears to form multimers larger than a dimer (possibly trimers or tetramers). These predictions of higher order structures are of particular interest with regards to the role of PfCelTOS in host cell traversal. It has been shown that CelTOS binds to phosphatidic acid on the inner leaflet of the cell membrane and functions to disrupt the plasma membrane by assembling a pore on the cytoplasmic-face to enable the exit of parasites from invaded host cells during cell traversal (Jimah et al. 2016). A biologically relevant in vivo phenomenon requiring protein functionality, nominally at an elevated temperature of 37 °C (Fig. 1d).
Quaternary structure can affect the immunogenicity of subunit vaccines in cases where conformational epitopes are present at protein-protein interfaces. For example, antibodies against viral envelop proteins can bridge adjacent epitopes to prevent conformational changes and can ultimately inhibit entry and egress of enveloped viruses (Fox et al. 2015). With respect to immunogens, the presentation of mixtures of different but related protein sequences such as site-directed mutants can reduce humoral immune responses and can impact the protective efficacy of an immunogen (Candia et al. 2016). These immunosuppressive effects are sequence-dependent as only some peptides bind MHC molecules and T-cell receptors. Here, we did not observe a significant difference in cellular responses; however, antibody levels induced by the CH-PfCelTOS were significantly higher than those induced by N-PfCel-TOS. Accurate detection of the effects of low abundance mistranslated proteins and peptides, against a background of wild-type protein, is challenging. Our findings suggest that the impact of these misincorporations on the immunogenicity of these protein products may be relatively low. Nevertheless, the impact of these sequence variants on the final product quality is difficult to estimate. Differences in the loss of phospholipid binding affinity were observed between the N-PfCelTOS and CH-PfCelTOS after the protein was subjected to a 24 h incubation at 37 °C suggesting a reduced propensity for the CH-protein to irreversibly aggregate. It should be noted that the recombinant PfCel-TOS constructs used here do not have cysteine residues. The CH-protein better retained its ability to bind phospholipids after the 37 °C 24 h incubation, suggesting a level of structural integrity. High molecular weight multimers that withstood SDS-PAGE separation also were observed for both proteins in western blots after 24 h incubation at 37 °C. After incubation at 65 °C for 24 h, a temperature near the mid-point of the thermal denaturation curve, fewer dimers and high molecular multimers were observed for the CHprotein in western blots suggesting fewer non-specific aggregates. Aggregation and protein precipitation are typically irreversible processes. Thus, these differences in thermostability may affect long-term storage. While heterologous expression in E. coli may be facile and economical, our results show the potential impact of codon usage on the fidelity of protein synthesis and protein homogeneity.