Skip to main content

Deep RNA-Seq profile reveals biodiversity, plant–microbe interactions and a large family of NBS-LRR resistance genes in walnut (Juglans regia) tissues


Deep RNA-Seq profiling, a revolutionary method used for quantifying transcriptional levels, often includes non-specific transcripts from other co-existing organisms in spite of stringent protocols. Using the recently published walnut genome sequence as a filter, we present a broad analysis of the RNA-Seq derived transcriptome profiles obtained from twenty different tissues to extract the biodiversity and possible plant–microbe interactions in the walnut ecosystem in California. Since the residual nature of the transcripts being analyzed does not provide sufficient information to identify the exact strain, inferences made are constrained to the genus level. The presence of the pathogenic oomycete Phytophthora was detected in the root through the presence of a glyceraldehyde-3-phosphate dehydrogenase. Cryptococcus, the causal agent of cryptococcosis, was found in the catkins and vegetative buds, corroborating previous work indicating that the plant surface supported the sexual cycle of this human pathogen. The RNA-Seq profile revealed several species of the endophytic nitrogen fixing Actinobacteria. Another bacterial species implicated in aerobic biodegradation of methyl tert-butyl ether (Methylibium petroleiphilum) is also found in the root. RNA encoding proteins from the pea aphid were found in the leaves and vegetative buds, while a serine protease from mosquito with significant homology to a female reproductive tract protease from Drosophila mojavensis in the vegetative bud suggests egg-laying activities. The comprehensive analysis of RNA-seq data present also unraveled detailed, tissue-specific information of ~400 transcripts encoded by the largest family of resistance (R) genes (NBS-LRR), which possibly rationalizes the resistance of the specific walnut plant to the pathogens detected. Thus, we elucidate the biodiversity and possible plant–microbe interactions in several walnut (Juglans regia) tissues in California using deep RNA-Seq profiling.


Rapid detection of pathogens in plants is becoming increasingly necessary to prevent loss of productivity and quality (Dandekar et al. 2010; Fletcher et al. 2006). The wide variety of diseases and pathogens necessitates a broad detection system (Asiatic citrus canker: Xanthomonas axonopodis, sudden oak death: Phytophthora ramorum, Pierce's disease of grapevine: Xylella fastidiosa, etc.). Traditionally, real-time PCR has been used extensively for plant disease diagnostics (Schaad and Frederick 2002). However, these diagnostic tools are biased, and can only detect pathogens with a known nucleic acid template. RNA-Seq, a high-throughput DNA sequencing method, has revolutionized the field of gene discovery (Wang et al. 2009; Flintoft 2008). RNA-Seq can detect transcripts with very low expression levels, in contrast to other traditional methods like RNA:DNA hybridization (Clark et al. 2002) and short sequence-based approaches (Kodzius et al. 2006). The RNA-Seq derived transcriptome with a selection protocol for polyadenylated mRNA from an organism with known genome enables detection of mRNA from extraneous eukaryotes like fungi and pests. Certain RNA-Seq protocols ensure that only polyadenylated mRNA is being analyzed, yet some bacterial mRNA does leak through in the analysis. Thus, this presents an unbiased method of diagnosing the presence of wide range of prokaryotic and eukaryotic organisms (Moretti et al. 2007; Janse 2010). Such a study can also guide downstream PCR diagnostics to determine the exact species/strain of a pathogen. We have recently used a RNA-Seq methodology to derive the transcriptome of walnut (Juglans regia) from twenty different tissues types with selection for polyadenylated mRNA in the course of obtaining the walnut genome sequence (WGS) (manuscript submitted). Firstly, we excluded transcripts that aligned to WGS and the E. coli genome. Expression counts enabled the determination of the localization, although the residual nature of the transcripts being analyzed did not provide sufficient information to identify the exact species/strain. Thus, inferences made were constrained to the genus. These counts were not normalized, since there were no comparisons of absolute or relative expression levels. Some non-polyadenylated bacterial mRNA leaked through the RNA-Seq analysis. We detected several well-known pathogens, fungi, endophytic bacteria, and pests. The detection of these pathogenic agents in an otherwise healthy plant can be ascribed to the presence and activity of resistance (R) genes that specifically recognize pathogens, which contain complementary avirulence genes (Staskawicz 2001).

The oomycete Phytophthora, a pathogen responsible for destructive diseases in a wide variety of crop plants, was found localized in the root (Fletcher et al. 2006; Belisario et al. 2012; Nowicki et al. 2012). Although sequence homology indicated the presence of several species of Phytophthora (nicotianae, infestans, parasitica), the similarity among these strains did not allow for an exact enumeration of the individual species. For example, a glyceraldehyde-3-phosphate dehydrogenase with 97 % identity to a GAPDH from P. parasitica and P. infestans is an enzyme detected from this pathogen. Cryptococcosis in human and animals is caused by Cryptococcus neoformans and C. gattii, which has been exacerbated in recent times in immuno-compromised individuals (Mitchell and Perfect 1995). The plant surface is a conducive environment for the sexual cycle of Cryptococcus (Xue et al. 2007). Here, we detect prolyl-isomerases and ADP/ATP translocases from Cryptococcus present in catkins and in vegetative buds, corroborating these findings. Endophytic Actinobacteria are present extensively in the inner tissues of living plants, and are a source of important secondary metabolites related to the defense response, growth and environmental stress (Ventura et al. 2007). Based on the top BLAST score, we detected several species in the Actinobacteria phyla spread out across all tissues. Methylibium petroleiphilum, which is capable of using methyl tert-butyl ether as a sole source of carbon, was also found in the root (Nakatsu et al. 2006). The ribosomal L37 protein from the pest pea aphid was found in the leaves and vegetative buds. Interestingly, a serine protease from the mosquito (Kelleher and Markow 2009) with significant homology to a female reproductive tract protease from Drosophila mojavensis (Isoe et al. 2009) in the vegetative bud suggested egg-laying activities by these pests.

Materials and methods


Fifteen samples of walnut tissue (Table 1) were gathered from Chandler trees in the UC Davis field facilities located in Davis, California. Three additional samples were taken from Chandler plant material maintained in tissue culture. The root sample was taken from potted Chandler trees in the greenhouse/lath house. Several grams of leaf and root tissue from each plant were frozen in liquid nitrogen immediately after harvest and then transferred to a −80 °C freezer. RNA was isolated from each sample using the hot borate method (Wilkins and Smart 1996) followed by purification and DNAse treatment using an RNA/DNA Mini Kit (Qiagen, Valencia, CA) per the manufacturer's protocol. High quality RNA was confirmed by running an aliquot of each sample on an Experion Automated Electrophoresis System (Bio-Rad Laboratories, Hercules, CA). The cDNA libraries were constructed following the Illumina mRNA-sequencing sample preparation protocol (Illumina Inc., San Diego, CA). Final elution was performed with 16 µL RNase-free water. The quality of each library was determined using a BioRad Experion (BioRad, Hercules, CA). Each library was run as an independent lane on a Genome Analyzer II (Illumina, San Diego, CA) to generate paired-end sequences of 85 bp in length from each cDNA library. In total, over a billion reads were obtained. Prior to assembly, all reads underwent quality control for paired-end reads and trimming using Sickle (Joshi and Fass 2011). The minimum read length was 45 bp with a minimum Sanger quality score of 35. The quality controlled reads were de novo assembled with Trinity v2.0.6 (Grabherr et al. 2011). Standard parameters were used and the minimum contig length was 300 bp. Individual assemblies for each library and a combined assembly of all tissues were performed (Chakraborty et al. 2015).

Table 1 Walnut tissue sources used for RNAseq analysis

In silico analysis

The NCBI database ( provides several resources for the ‘curated classification and nomenclature of all of organisms in the public sequence databases. This currently represents about 10 % of the described species of life on the planet.’ There were ~111 k transcripts. ~5 k did not align to the walnut genome, and were removed (Chakraborty et al. 2015). Of these, ~4 k transcripts had significant homology to E. coli genomes. The remaining ~1 k transcripts were the subject of analysis in the current manuscript, under the assumption that they were derived from extraneous organisms, pathogens or commensal, inhabiting the twenty different tissues. The species names were derived from the best BLAST match to the ‘nt’ database. A bitscore (BLASTSCORE) cutoff of 150 was used (~E-value = 1E − 33). The numerical identifier was obtained from the species name using the site identifier.cgi. For example, Arthrobacter has the tax ID 1663. These numerical IDs were then used to obtain the complete lineage. The first classification of all organisms was into Eukaryota or Bacteria. We used the second classification field to cluster the organisms discussed here. The expression counts are not normalized since we do not make any inferences on the absolute or relative abundance of the transcripts.

The iterative gene finding method described in YeATS (Chakraborty et al. 2015) was used to identify the homologous set of nucleotide-binding site (NBS)-leucine-rich repeat (LRR) class of genes. A BLAST bitscore of 100 (E value ~1E − 20), with an increment of 20 for each iteration, was used as the homology threshold. The increment in each of the iterations ensures that the resultant proteins do not diverge far from the initially chosen protein. We have used 600 as a lower threshold for the length of NBS-LRR proteins. Additionally, we exclude transcripts with  % of leucine less than the 10 % frequency of leucine residues seen in plant proteomes. These transcripts are probably fragments which have not been assembled by Trinity (Chakraborty et al. 2015).


As a result of the analysis of ~1 k transcripts obtained from 20 different tissues of walnut, different extraneous organisms, pathogens or commensal, were detected (Fig. 1). Several transcripts (N = 260) were associated with Phytophthora, mostly localized in the root (Fig. 2a). A sample of transcripts, and the putative proteins they encode, identified C43181_G1_I1 encoding a 293 nt long ORF with a predicted protein (molecular weight = 30 kDa) homologous to glyceraldehyde-3-phosphate dehydrogenase (GAPDH) (Tables 2, 3a). This GAPDH has 97 % identity to a GAPDH from P. parasitica and P. infestans, and a 96 % identity to the GAPDH from P. sojae (Fig. 3a). The 3D structure of PhyGAPDH1 was modeled using SWISSMODEL (Arnold et al. 2006). The structural superimposition of the PhyGAPDH1 to the structure of the human placental GAPDH (PDBid:1U8F, chain O) reveals the structural conservation of this gene across different species (Fig. 3b). In this study, the presence of Cryptococcus was also confirmed in the catkins and the vegetative buds (Table 3b). We identified a cyclophilin A (peptidyl-prolyl cis–trans isomerase) (Table 4a) associated with Cryptococcus. In addition, we detected several transcripts associated with the endophytic Actinobacteria (EndAct) in several tissues of walnut (Fig. 2b, Table 5). Most putative proteins from these transcripts have significant homologs in the BLAST’nr’ database, although most of them are uncharacterized. Other interesting results are the presence of Methylibium petroleiphilum in the roots (3442 cumulative counts, Fig. 2c), Acyrthosiphon pisum (or the pea aphid) with 1256 cumulative counts in the leaves (Fig. 2d) and Aedes aegypti (yellow fewer mosquito) in the vegetative bud of walnut, with a total of 428 cumulative counts of transcripts (Fig. 2e).

Fig. 1
figure 1

Diseases, pests and pathogens affecting walnut: Images obtained from and used under the Creative Commons Attribution 2.5 Generic license

Fig. 2
figure 2

Localization of transcripts. These shows the cumulative counts (not normalized) of transcripts assigned to each genus. a Phytophthora—localized in the root. b Actinobacteria—present in all tissues c Methylibium—localized in the root. d Acyrthosiphon—localized in early leaves and the vegetative bud. e Aedes aegypti—localized in the vegetative bud

Table 2 Proteins from Phytophthora
Table 3 Expression counts of selected transcripts
Fig. 3
figure 3

Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) from Phytophthora: C43181_G1_I1 encodes a 293 bp long ORF with a predicted molecular weight of 30 kDa and has 97 % identity GAPDH from P. parasitica and P. infestans, and a 96 % identity to the GAPDH from P. sojae. Thus, although the presence of a pathogen from the Phytophthora genus is almost certain, it is not possible to determine the exact strain of this pathogen. The Phytophthora GAPDH also shares a 70 % identity with the GAPDH in human placenta. a Multiple sequence alignment of the GAPDHs obtained using ENDscript 2.x (Robert and Gouet 2014). b Structural superimposition of the C43181_G1_I1 GAPDH to the structure of the human placental GAPDH (PDBid:1U8F, chain O). The structure of C43181_G1_I1 GAPDH was modelled using SWISSMODEL

Table 4 Transcripts from fungi
Table 5 Species from Actinobacteria


High-throughput mRNA sequencing (RNA-Seq) has revolutionized the view of the profile of the transcriptome, enhancing gene discovery. Although some protocols are designed to view exclusively polyadenylated eukaryotic mRNA, prokaryotic mRNA can be surreptitiously included in the analysis, especially highly abundant transcripts like ribosomal proteins. This presents an opportunity to identify extraneous transcripts residing in various tissues, provided the genome of the organism is known. Expression counts are low due to the residual nature of the analysis. Yet, as we observed for pea aphid, very low counts were able to accurately identify the L37 ribosomal protein, which shares 88 % identity with the L37 protein from Drosophila Melanogaster (Anger et al. 2013).

Phytophthora: causal agent of potato blight

The oomycete Phytophthora is a pathogen responsible for destructive diseases in a wide variety of crop plants, including tomato, potato (Nowicki et al. 2012) and walnut (Belisario et al. 2012) (Fig. 1). Although the presence of a pathogen from the Phytophthora genus is almost certain, it is not possible to determine the exact strain of this pathogen. GAPDH is involved in gycolysis, and other non-metabolic processes (Tarze et al. 2007), and is a well-known housekeeping gene (Eisenberg and Levanon 2013). The Phytophthora GAPDH also shares a 70 % identity with the GAPDH in human placenta (Jenkins and Tanner 2006).


Cryptococcus: Causal agent of cryptococcosis in human

These fungi are mostly localized in the catkins and the vegetative bud in walnut, corroborating previous results about their sexual cycle (Xue et al. 2007). Cryptococcosis is a disease of the respiratory system in human and animals, caused by Cryptococcus neoformans and C. gattii, and exacerbated in patients infected with the AIDS virus (Mitchell and Perfect 1995). Plants are known to host a large number of commensal fungi (Schmit and Mueller 2007). An interesting ecological experiment demonstrated that the plant surface is a conducive environment, stimulating the sexual cycle of Cryptococcus (Xue et al. 2007). Myo-inositol and the plant growth hormone IAA synergistically were proved as ‘strong aphrodisiacs’ (Xue et al. 2007). Two homologous cyclophilin A genes (KGB74193 and KGB74187) have been shown to influence cell growth, mating and virulence (Wang et al. 2001). A peptidyl-prolyl cis–trans isomerase from Cryptococcus (Ess1), non-homologous to the above two genes, is required only for virulence (Ren et al. 2005). Another pathogenic fungus, Pyrenophora (teres/triticirepentis), and the causal agent of the disease ‘tan spot’ (Liu et al. 2011) have been identified in several tissues (Tables 3c, 4b).

Actinobacteria: nitrogen fixing bacterial diazotrophs

Bacterial mRNA is non-polyadenylated, and most should be excluded by the RNA-Seq library preparation method, but some mRNA invariably leaks through. EndAct are present extensively in the inner tissues of living plants, and are a source of important secondary metabolites related to the defense response, growth and environmental stress (Qin et al. 2011; Palaniyandi et al. 2013). The significant homology of the putative proteins from these transcripts after BLAST results highlights the ability of the current methodology to detect a genus with fair precision. A clear example is the transcript C54818_G4_I1 that encodes a 68 bp long ORF, and matches to a ‘MULTISPECIES: hypothetical protein’ from Streptomyces with 82 % identity. The nitrogen fixing diazotroph, Paenibacillus polymyxa P2b-2R, was found to enhance the growth of the important oilseed crop canola (Puri et al. 2016). EndAct obtained from healthy wheat tissue was shown to ‘prime’ the systemic acquired resistance (SAR) and the jasmonate/ethylene (JA/ET) pathways in Arabidopsis thaliana JA/ET pathways when infected with bacterial pathogen Erwinia carotovora subsp. carotovora or the fungal pathogen Fusarium oxysporum, respectively (Conn et al. 2008). The importance of EndAct in biodiversity was established in a tropical rainforest native plant, which identified a total of 312 Actinobacteria associated with the order Actinomycetales (Qin et al. 2012). Recently, the genome sequence of Arthrobacter koreensis 5J12A, a desiccation-tolerant strain, was obtained (Manzanera et al. 2015). Teicoplanin, an antibiotic working against Gram positive bacteria like methicillin-resistant Staphylococcus aureus and Enterococcus faecalis, was obtained from the fermentation broth of a strain of Actinoplanes teichomyceticus (Jung et al. 2008). Another EndAct (Streptomyces) was shown to produce lipase, β-1-3-glucanase and chitinase (defence related enzymes), and aid plant growth (Gopalakrishnan et al. 2013). Micrococcus sp NII-0909 isolated from the Western ghat forest soil in India had demonstrable ability to enhance soil fertility and promote plant growth (Dastager et al. 2010). EndAct, for example Corynebacteria, can be associated with plant pathogenicity (Vidaver 1982). Also, juglone has been found to have inhibitory effects on some of these nitrogen fixing bacteria (Dawson et al. Dawson and Seymour 1983).

Methylibium petroleiphilum: involved in aerobic biodegradation of methyl tert-butyl ether

Methylibium petroleiphilum is capable of using methyl tert-butyl ether as a sole source of carbon in the root (Fig. 2c) (Nakatsu et al. 2006). Unlike Phytophthora, there are only two transcripts for Methylibium. One transcript is 461 nt long and has a 92 % identity to the M. petroleiphilum PM1 genome (Accession: CP000555.1). The ORF from this transcript has a 79 % identity to part of a protein (Accession: ABM53545.1) from uncultured beta proteobacterium CBNPD1 BAC clone 578, which was obtained in a metagenomic analysis of a freshwater toxic cyanobacteria bloom (Pope and Patel 2008). Although the PCR-generated 75-clone, 16S rRNA gene library had confirmed the presence of Proteobacteria, data here associates this protein with M. petroleiphilum. Interestingly, this transcript or ORF has no match in the new draft sequence of the Methylibium sp. strain T29, a fuel oxygenate-degrading bacterial isolate from Hungary (Szabó et al. 2015). This is explained by the differences observed: ‘unlike M. petroleiphilum PM1 our isolate does not harbor the mega plasmid which carries the genes for MTBE-degradation’ (Szabó et al. 2015).

Acyrthosiphon pisum: the pea aphid pest

While both Methylibium and Phytophthora are mostly localized in the root, thepeaaphid A. pisum, which is a pest of importance in agriculture (Van Emden and Harrington 2007), was found in the leaves (Fig. 2d). One transcript of Acyrthosiphon (C58762 _G1_I1) encodes a 91 amino acid long ORF having a 99 % match to the ribosomal L37 protein (Accession: NP 001129424.1). The low count of this transcript demonstrates the accuracy of the RNA-Seq technology (Table 3d).

Aedes aegypti: yellow fever mosquito

The presence of yellow fever mosquito in the vegetative bud was not expected (Fig. 2e) and was not previously reported in Northern California. The proteins found there include proteases (both serine and metallo), ribosomal RNA and an elongation factor (Table 6). Among the serine proteases, C40984_G1_I1 encodes a trypsin that has a significant similarity to a female reproductive tract protease from Drosophila mojavensis (Uniprot id: C5IB51) (Kelleher and Markow 2009), suggesting that the mosquito had been using the vegetative buds for reproductive purposes. The importance of the serine proteases in egg-formation abilities of mosquitoes were established using a RNAi knockdown method (Isoe et al. 2009). Another interesting development has been the recent monitoring of Aedes aegypti and Aedes albopictus by the California Department of Public Health.

Table 6 Transcripts from Aedes aegypti, the yellow fever mosquito

Their detection sites are updated regularly. ( However, their detection method is not known to us.

Nucleotide-binding site (NBS) and leucine-rich repeats (LRR) in walnut

The detection of several well-characterized plant pathogens in the current study raises the question of what innate resistance mechanism in walnut could provide resistance to these virulent agents. While it is possible that these strains of these pathogens are non-virulent, it is equally likely that this plant encodes and transcribes the desired resistance genes and transcripts needed to combat a virulent response from any one of these pathogens. Plants possess two distinct kinds of defence mechanisms—the pathogen-associated molecular patterns (PAMP) mediated immunity (PTI) and effector-triggered immunity encoded by resistance (R) genes (ETI). PTI is analogous to the first line of defense (innate immunity) in vertebrates, which is bypassed or disrupted by pathogen effector molecules that are used to downregulate PTI, making the cell vulnerable to pathogen attack (Nicaise et al. 2009). R genes have evolved in this ensuing “evolutionary warfare” in plants, akin to the mammalian adaptive immunity, to recognize pathogens which contain complementary avirulence genes (DeYoung and Innes 2006). However, unlike the mammalian adaptive immune system which is enforced through specialized cells, R genes are active in all plant cells. The majority of R genes encode proteins comprise of a nucleotide-binding site (NBS) and leucine-rich repeats (LRRs). NBS-LRR proteins recognize and neutralize specialized pathogen avirulence (Avr) proteins, leading to the upregulation of PTI, thus, providing plants with resistance to the attack (Hayashi et al. 2010; Ernst et al. 2002; Borhan et al. 2004; Zhang et al. 2010). It has been hypothesized that the distribution and diversity of NBS-LRR sequences is a direct consequence of extensive duplication and random rearrangements, endowing plants with the ability to recognize diverse molecules arising from dynamically changing biotic challenges (Meyers et al. 2003). Here, we briefly describe the transcripts and expression levels of the NBS-LRRs genes in walnut. Two specific examples of such NBS-LRRs genes conferring resistance to plants are the blast-resistance gene Pb1 NBS-LRR from rice (Uniprot: E3WF10) (Hayashi et al. 2010) and the cyst nematode resistance gene from tomato (Uniprot: Q8GT46) (Ernst et al. 2002). In addition to being leucine rich, these two proteins are also abundant with the negatively charged glutamic acid (Fig. 4a). Although these specific NBS-LRRs are ~1300 amino acid long, the typical length of NBS-LRR varies from a few hundred to <2 K (McHale et al. 2006). Using these proteins as initial search entities, we have identified ~400 NBS-LRR transcripts (excluding splice variants denoted by transcripts having the same prefix) in walnut through the ‘findgene’ algorithm described earlier by us (Chakraborty et al. 2015) (Fig. 4b). This is in excellent agreement with the 374 NBS-LRR genes that were identified in a genome wide study of Chinese chestnut (Castanea mollissima) resistant to Chestnut Blight Disease (Zhong et al. 2015). The tissue-specific expression pattern allows the discrimination of the truly critical genes in this large family (Table 7). The tissue-specific nature of certain genes is exemplified by C54426_G7_I1, which has significantly higher expression in the catkins and hull, and shares 78 % identity with a Strubbelig-receptor family (SRF) from Malus domestica. SRFs are receptor-like kinases (Eyüboglu et al. 2007), and are involved in tissue morphogenesis (Vaddepalli et al. 2011) and immune response (Alcázar et al. 2010). As corroboration, we chose one NBS-LRR protein from each of the two major domains of NBS-LRR (McHale et al. 2006)—TIR-NBS-LRR (Uniprot:Q6QX58 (Borhan et al. 2004)) and CC-NBS-LRR Uniprot:Q56YM8 (Meyers et al. 2003), and obtained the same number of transcripts encoded by NBS-LRR genes. Thus, we demonstrate that transcriptomic data that has revealed the biodiversity in different tissues of walnut simultaneously provides insights into the ability of the plant to negate the threat posed by some of these potentially destructive pathogens.

Fig. 4
figure 4

Nucleotide-binding site (NBS) and leucine-rich repeats (LRR) in walnut. a Amino acid frequency for two NBS-LRRs. The blast-resistance gene Pb1 NBS-LRR from rice (Uniprot: E3WF10) is in red, while the cyst nematode resistance gene from tomato (Uniprot: Q8GT46) is in green. While, expectedly, both these proteins are leucine rich, we also observe a large proportion of negatively charged glumatic acid. b Phylogenetic tree for the ~400 identified NBS-LRR genes in walnut obtained by Neighbor Joining/UPGMA phylogeny implemented in MAFFT (Katoh et al. 2002) and drawn with FigTree v1.4.2

Table 7 Expression counts of ten highly expressed transcripts from the NBS-LRR family

In summary, high conservation of some proteins within a genus does not allow the proper characterization of the species. Thus, although we can state with a great degree of certainty the presence of the genus Phytophthora, it is not possible to identify the exact species/strain. No viruses have been detected using the current methodology. Also, since the root samples were derived from a sterile sample, we did not detect root lesion nematodes (Pratylenchus vulnus), a major source of concern for the California walnut industry (Walawage et al. 2013). The detection of specific proteins from pathogens can serve as a target for therapeutics. The methodology described here presents an unbiased rapid tool to extract the metagenome from an RNA-Seq profile that can be used to develop diagnostics. In this study, the profile represented twenty different tissues from walnut, and the extracted metagenome from all of these tissue types presents a vivid picture of the biodiversity in its surroundings in California.


  • Alcázar R, García AV, Kronholm I, de Meaux J, Koornneef M, et al. Natural variation at strubbelig receptor kinase 3 drives immune-triggered incompatibilities between Arabidopsis thaliana accessions. Nat Genet. 2010;42:1135–9.

    Article  PubMed  Google Scholar 

  • Anger AM, Armache JP, Berninghausen O, Habeck M, Subklewe M, Wilson DN, Beckmann R. Structures of the human and Drosophila 80S ribosome. Nature. 2013;497:80–5. doi:10.1038/nature12104.

    Article  CAS  PubMed  Google Scholar 

  • Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based environment for protein structure homology modelling. Bioinformatics. 2006;22:195–201. doi:10.1093/bioinformatics/bti770.

    Article  CAS  PubMed  Google Scholar 

  • Belisario A, Luongo L, Galli M, Vitale S. First report of phytophthora megasperma associated with decline and death of common walnut trees in Italy. Plant Dis. 2012;96:1695. doi:10.1094/PDIS-05-12-0470-PDN.

    Article  Google Scholar 

  • Borhan MH, Holub EB, Beynon JL, Rozwadowski K, Rimmer SR. The Arabidopsis TIR-NB-LRR gene RAC1 confers resistance to Albugo candida (white rust) and is dependent on EDS1 but not PAD4. Mol Plant Microbe Interact. 2004;17:711–9.

    Article  CAS  PubMed  Google Scholar 

  • Chakraborty S, Britton M, Wegrzyn J, Butterfield T, Martínez-García PJ, Rao BJ, Leslie CA, Aradhaya M, Neale D, Woeste K, Dandekar AM. YeATS—a tool suite for analyzing RNA-seq derived transcriptome identifies a highly transcribed putative extensin in heartwood/sapwood transition zone in black walnut [version 2; referees: 1 approved]] F1000Research 2015, 2015;4:155. doi:10.12688/f1000research.6617.1.

  • Clark TA, Sugnet CW, Ares M. Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science. 2002;296:907–10. doi:10.1126/science.1069415.

    Article  CAS  PubMed  Google Scholar 

  • Conn V, Walker A, Franco C. Endophytic actino bacteria induce defense pathways in Arabidopsis thaliana. MPMI. 2008;21:208–18. doi:10.1094/MPMI-21-2-0208.

    Article  CAS  PubMed  Google Scholar 

  • Dandekar AM, Martinelli F, Davis C, Bhushan A, Zhao W, Fiehn O, Skogerson K, Wohlgemuth G, D’Souza R, Roy S, Reagan RL, Lin D, Cary RB, Pardington P, Gupta G. Analysis of early host responses for asymptomatic disease detection and management of specialty crops. Crit Rev Immunol. 2010;30(3):277–89. doi:10.1615/CritRevImmunol.v30.i3.50.

    Article  CAS  PubMed  Google Scholar 

  • Dastager SG, Deepa C, Pandey A. Isolation and characterization of novel plant growth promoting Micrococcus sp NII-0909 and its interaction with cowpea. Plant Physiol Biochem. 2010;48:987–92. doi:10.1016/j.plaphy.2010.09.006.

    Article  CAS  PubMed  Google Scholar 

  • Dawson J, Seymour P. Effects of juglone concentration on growth in vitro of Frankia ArI3 and Rhizobium japonicum strain 71. J Chem Ecol. 1983;9:1175–83. doi:10.1007/BF00982220.

    Article  CAS  PubMed  Google Scholar 

  • DeYoung BJ, Innes RW. Plant NBS-LRR proteins in pathogen sensing and host defense. Nature Immunol. 2006;7:1243–9.

    Article  CAS  Google Scholar 

  • Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29:569–74. doi:10.1016/j.tig.2013.05.010.

    Article  CAS  PubMed  Google Scholar 

  • Ernst K, Kumar A, Kriseleit D, Kloos DU, Phillips MS, et al. The broad-spectrum potato cystnematode resistance gene (Hero) from tomato is the only member of a large gene family of NBS-LRR genes with an unusual amino acid repeat in the LRR region. Plant J. 2002;31:127–36.

    Article  CAS  PubMed  Google Scholar 

  • Eyüboglu B, Pfister K, Haberer G, Chevalier D, Fuchs A, et al. Molecular characterisation of the STRUBBELIG-RECEPTOR FAMILY of genes encoding putative leucine-rich repeat receptor-like kinases in Arabidopsis thaliana. BMC Plant Biol. 2007;7:16.

    Article  PubMed Central  PubMed  Google Scholar 

  • Fletcher J, Bender C, Budowle B, Cobb W, Gold S, Ishimaru CA, Luster D, Melcher U, Murch R, Scherm H, Seem RC, Sherwood JL, Sobral BW, Tolin SA. Plant pathogen forensics: capabilities, needs, and recommendations. MMBR. 2006;70:450–71. doi:10.1128/MMBR.00022-05.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Flintoft L. Transcriptomics: digging deep with RNA-Seq. Nat Rev Genet. 2008;9:568. doi:10.1038/nrg2423.

    Article  Google Scholar 

  • Gopalakrishnan S, Srinivas V, Vidya MS, Rathore A. Plant growth-promoting activities of Streptomyces spp. in sorghum and rice. SpringerPlus 2013;2:574. doi:10.1186/2193-1801-2-574.

  • Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–52. doi:10.1038/nbt.1883.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Hayashi N, Inoue H, Kato T, Funao T, Shirota M, et al. Durable panicle blast-resistance gene Pb1 encodes an atypical CC-NBR-LRR protein and was generated by acquiring a promoter through local genome duplication. Plant J. 2010;64:498–510.

    Article  CAS  PubMed  Google Scholar 

  • Isoe J, Rascón AA, Kunz S, Miesfeld RL. Molecular genetic analysis of midgut serine proteases in Aedes aegypti mosquitoes. Insect Biochem Molec Biol. 2009;39:903–12. doi:10.1016/j.ibmb.2009.10.008.

    Article  CAS  Google Scholar 

  • Janse J. Diagnostic methods for phytopathogenic bacteria of stone fruits and nuts in COST 873. EPPO Bulletin. 2010;40:68–85. doi:10.1111/j.1365-2338.2009.02356.x.

    Article  Google Scholar 

  • Jenkins JL, Tanner JJ. High-resolution structure of human d-glyceraldehyde-3-phosphate dehydrogenase. Acta Crystallogr Sect D Biol Crystallogr. 2006;62:290–301. doi:10.1107/S0907444905042289.

    Article  Google Scholar 

  • Joshi N, Fass J. Sickle. A sliding-window, adaptive, quality-based trimming tool for fastq files. 2011. (version 1.33)[software].

  • Jung HM, Kim SY, Prabhu P, Moon HJ, Kim IW, Lee JK. Optimization of culture conditions and scale-up to plant scales for teicoplanin production by Actinoplanes teichomyceticus. Appl Microbiol Biotechnol. 2008;80:21–7. doi:10.1007/s00253-008-1530-2.

    Article  CAS  PubMed  Google Scholar 

  • Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–66.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kelleher ES, Markow TA. Duplication, selection and gene conversion in a Drosophila mojavensis female reproductive protein family. Genetics. 2009;181:1451–65. doi:10.1534/genetics.108.099044.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kodzius R, Kojima M, Nishiyori H, Nakamura M, Fukuda S, Tagami M, Sasaki D, Imamura K, Kai C, Harbers M, Hayashizaki Y, Carninci P. CAGE: cap analysis of gene expression. Nat Methods. 2006;3:211–22. doi:10.1038/nmeth0306-211.

    Article  CAS  PubMed  Google Scholar 

  • Liu Z, Ellwood SR, Oliver RP, Friesen TL. Pyrenophora teres: profile of an increasingly damaging barley pathogen. Mol Plant Pathol. 2011;12:1–19. doi:10.1111/j.1364-3703.2010.00649.x.

    Article  PubMed  Google Scholar 

  • Manzanera M, Narváez-Reinaldo JJ, García-Fontana C, Vílchez JI, González-López J. Genome sequence of Arthrobacter koreensis 5J12A, a plant growth-promoting and desiccation-tolerant strain. Genome Announc. 2015;3:e00648. doi:10.1128/genomeA.00648-15.

    PubMed Central  PubMed  Google Scholar 

  • McHale L, Tan X, Koehl P, Michelmore RW. Plant NBS-LRR proteins: adaptable guards. Genome Biol. 2006;7:212.

    Article  PubMed Central  PubMed  Google Scholar 

  • Meyers BC, Kozik A, Griego A, Kuang H, Michelmore RW. Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis. Plant Cell. 2003;15:809–34.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Mitchell TG, Perfect JR. Cryptococcosis in the era of aids–100 years after the discovery of cryptococcus neoformans. Clin Microbiol Rev. 1995;8:515–48. doi:10.1016/S0190-9622(97)80336-2.

    PubMed Central  CAS  PubMed  Google Scholar 

  • Moretti C, Silvestri F, Rossini E, Natalini G, Buonaurio R. A protocol for rapid identification of brenneria nigrifluens among bacteria isolated from bark cankers in persian walnut plants. J Plant Pathol. 2007;89:211–8. doi:10.4454/jpp.v89i3.766.

    Google Scholar 

  • Nakatsu CH, Hristova K, Hanada S, Meng XY, Hanson JR, Scow KM, Kamagata Y. Methylibium petroleiphilum gen. nov., sp. nov., a novel methyl tert-butyl ether-degrading methylotroph of the Betaproteobacteria. Int J Syst Evol Microbiol. 2006;56:983–9. doi:10.1099/ijs.0.63524-0.

    Article  CAS  PubMed  Google Scholar 

  • Nicaise V, Roux M, Zipfel C. Recent advances in PAMP-triggered immunity against bacteria: pattern recognition receptors watch over and raise the alarm. Plant Physiol. 2009;150:1638–47.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Nowicki M, Foolad MR, Nowakowska M, Kozik EU. Potato and tomato late blight caused by phytophthora infestans: an overview of pathology and resistance breeding. Plant Dis. 2012;96:4–17. doi:10.1094/PDIS-05-11-0458.

    Article  Google Scholar 

  • Palaniyandi SA, Yang SH, Zhang L, Suh JW. Effects of actinobacteria on plant disease suppression and growth promotion. Appl Microbiol Biotechnol. 2013;97:9621–36. doi:10.1007/s00253-013-5206-1.

    Article  CAS  PubMed  Google Scholar 

  • Pope PB, Patel BK. Metagenomic analysis of a freshwater toxic cyanobacteria bloom. FEMS Microbiol Ecol. 2008;64:9–27. doi:10.1111/j.1574-6941.2008.00448.x.

    Article  CAS  PubMed  Google Scholar 

  • Puri A, Padda KP, Chanway CP. Evidence of nitrogen fixation and growth promotion in canola (Brassica napus L.) by an endophytic diazotroph Paenibacillus polymyxa P2b–2R. Biol Fertil Soils. 2016;52:119–25. doi:10.1007/s00374-015-1051-y.

    Article  CAS  Google Scholar 

  • Qin S, Chen HH, Zhao GZ, Li J, Zhu WY, Xu LH, Jiang JH, Li WJ. Abundant and diverse endophytic actinobacteria associated with medicinal plant Maytenus austroyunnanensis in Xishuangbanna tropical rainforest revealed by culture-dependent and culture-independent methods. Environ Microbiol Rep. 2012;4:522–31. doi:10.1111/j.1758-2229.2012.00357.x.

    Article  PubMed  Google Scholar 

  • Qin S, Xing K, Jiang JH, Xu LH, Li WJ. Biodiversity, bioactive natural products and biotechnological potential of plant-associated endophytic actinobacteria. Appl Microbiol Biotechnol. 2011;89:457–73. doi:10.1007/s00253-010-2923-6.

    Article  CAS  PubMed  Google Scholar 

  • Ren P, Rossettini A, Chaturvedi V, Hanes SD. The ess1 prolyl isomerase is dispensable for growth but required for virulence in cryptococcus neoformans. Microbiol. 2005;151:1593–605. doi:10.1099/mic.0.27786-0.

    Article  CAS  Google Scholar 

  • Robert X, Gouet P. Deciphering key features in protein structures with the new ENDscript server. Nucl Acids Res. 2014;42(W1):W320–4.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Schaad NW, Frederick RD. Real-time PCR and its application for rapid plant disease diagnostics. Can J Plant Pathol. 2002;24:250–8. doi:10.1080/07060660209507006.

    Article  CAS  Google Scholar 

  • Schmit JP, Mueller GM. An estimate of the lower limit of global fungal diversity. Biodivers Conserv. 2007;16:99–111. doi:10.1007/s10531-006-9129-3.

    Article  Google Scholar 

  • Staskawicz BJ. Genetics of plant-pathogen interactions specifying plant disease resistance. Plant Physiol. 2001;125:73–6.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Szabó Z, Gyula P, Robotka H, Bató E, Gálik B, Pach P, Pekker P, Papp I, Bihari Z. Draft genome sequence of Methylibium sp. strain T29, a novel fuel oxygenate-degrading bacterial isolate from Hungary. Stand Genomic Sci. 2015;10:1–10. doi:10.1186/s40793-015-0023-z.

    Article  Google Scholar 

  • Tarze A, Deniaud A, Le Bras M, Maillier E, Molle D, Larochette N, Zamzami N, Jan G, Kroemer G, Brenner C. Gapdh, a novel regulator of the pro-apoptotic mitochondrial membrane permeabilization. Oncogene. 2007;26:2606–20. doi:10.1038/sj.onc.1210074.

    Article  CAS  PubMed  Google Scholar 

  • Vaddepalli P, Fulton L, Batoux M, Yadav RK, Schneitz K. Structure-function analysis of STRUBBELIG, an Arabidopsis atypical receptor-like kinase involved in tissue morphogenesis. PLoS One. 2011;6(5):e19730.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Van Emden HF, Harrington R. Aphids as crop pests. CABI. 2007.

  • Ventura M, Canchaya C, Tauch A, Chandra G, Fitzgerald GF, Chater KF, van Sinderen D. Genomics of actinobacteria: tracing the evolutionary history of an ancient phylum. MMBR. 2007;71:495–548. doi:10.1128/MMBR.00005-07.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Vidaver AK. The plant pathogenic corynebacteria. Ann Rev Microbiol. 1982;36:495–517. doi:10.1146/annurev.mi.36.100182.002431.

    Article  CAS  Google Scholar 

  • Walawage SL, Britton MT, Leslie CA, Uratsu SL, Li Y, Dandekar A. Stacking resistance to crown gall and nematodes in walnut rootstocks. BMC Genom. 2013;14:668. doi:10.1186/1471-2164-14-668.

    Article  CAS  Google Scholar 

  • Wang P, Cardenas ME, Cox GM, Perfect JR, Heitman J. Two cyclophilin a homologs with shared and distinct functions important for growth and virulence of cryptococcus neoformans. EMBO Rep. 2001;2:511–8. doi:10.1093/embo-reports/kve109.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi:10.1038/nrg2484.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Wilkins TA, Smart LB. Isolation of RNA from Plant Tissue. In: Krieg PA, editor. A laboratory guide to RNA: isolation, analysis, and synthesis. New York: Wiley-Liss Inc; 1996. p. 21–41.

    Google Scholar 

  • Xue C, Tada Y, Dong X, Heitman J. The human fungal pathogen cryptococcus can complete its sexual cycle during a pathogenic association with plants. Cell Host Microbe. 2007;1:263–73. doi:10.1016/j.chom.2007.05.005.

    Article  CAS  PubMed  Google Scholar 

  • Zhang J, Li W, Xiang T, Liu Z, Laluk K, et al. Receptor-like cytoplasmic kinases integrate signaling from multiple plant immune receptors and are targeted by a Pseudomonas syringae effector. Cell Host Microbe. 2010;7:290–301.

    Article  CAS  PubMed  Google Scholar 

  • Zhong Y, Li Y, Huang K, Cheng ZM. Species-specific duplications of NBS-encoding genes in chinese chestnut (Castanea mollissima). Scientific reports 5. 2015.

Download references

Authors’ contributions

Dandekar and Britton were involved in the design of the RNAseq, Dandekar and Chakraborty designed the metagenome extraction, Chakraborty did the in silico analysis that revealed the metagenome, Martínez-García was involved in the validation with the walnut genome sequence. Chakraborty wrote the first draft and the rest of the authors were involved in subsequent modifications.


Grant information: The authors wish to acknowledge support from the California Walnut Board and UC Discovery program.

Accesion number: Sequence data from this article can be found in the NCBI Sequence Read Archive under BioProject PRJNA232394.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Abhaya M. Dandekar.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chakraborty, S., Britton, M., Martínez-García, P.J. et al. Deep RNA-Seq profile reveals biodiversity, plant–microbe interactions and a large family of NBS-LRR resistance genes in walnut (Juglans regia) tissues. AMB Expr 6, 12 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: