RNA sequencing RNA seq is a
RNA-sequencing (RNA-seq) is a particularly effective technology for gene discovery in a given process, especially in non-model species for which reference genome sequences are not available. Although the major application of RNA-seq analyses is the identification of differentially expressed genes (DEGs), this technology is also very useful for the identification of expressed transcripts related to genes involved in metabolic pathways of interest. In this context, there is a lack of comprehensive genomic or transcriptomic resources for other subfamilies and especially for V. carinata. In order to facilitate gene discovery in this species, we have de novo generated a reference transcriptome for V. carinata and provided a general gene annotation focusing on the identification of putative transcripts encoding a diversity of CysPs, from legumain, metacaspase, calpain, pyroglutamyl up to the more abundant papain-like CysPs.
Materials and methods
Discussion Transcriptome sequencing is a high-throughput and cost-effective method for generating genetic resources in non-model organisms who do not have genomic information. Within the past few years, next generation sequencing (NGS) technology has begun generating data with 98% accuracy, which in combination with multiple bioinformatic approaches, can be used for the efficient de novo assembly of transcriptomes (Johnson et al., 2012). V. carinata leaf transcriptome analysis was carried out since leaf tissue is the main tissue where water and nutrients Genistein occur by unique structures called trichomes. This system allows epiphytes to grow where little water is available to the roots (Benzing, 2000). This is the first report of a large-scale transcriptome sequencing analysis in V. carinata, disregarding the investigation reported by our group which identified a set of 35 pre-miRNAs and their targets (Guzman et al., 2013). In this study, a total of 65,825,224 reads generated by the Illumina platform were assembled into 43,232 non-redundant contigs with multiple k-mers to improve the sensitivity, especially against low expressed genes (Gruenheit et al., 2012). Furthermore, several studies merged distinct k-mer assemblies to eliminate contigs that are perfect subsequences of longer transcripts (Chiara et al., 2013; Haznedaroglu et al., 2012). The assembly search statistics are all congruent with previous transcriptome studies that use the same software in Bromeliad family. The number of non-redundant contigs exceeded the 23,669 to 31,809 contigs observed in Pitcairnia spp. (Palma-Silva et al., 2016) and the 41,052 contigs of A. comosus var. bracteatus (Ma et al., 2015), but was smaller than the 86,609 contigs reported in Aechmea fasciata (Li et al., 2016). The observed N50 for V. carinata (1829 bp) were higher than the A. fasciata (1656 bp) and A. comosus var.bracteactus (1520 bp) de novo transcriptome. Although these parameters are commonly used for assembly evaluation, they may have little informative value for transcriptome assembly evaluation (Wang et al., 2014). In fact, N50 measures the continuity of contigs, but not their accuracy (O'Neil and Emrich, 2013). Hence, in addition to those parameters, the completeness and accuracy of transcriptome assembly was evaluated by the presence of orthologous groups using BUSCO and Orthofinder software. In our work, most of orthologous groups from Embryophyta and Eukaryota were identified in the V. carinata transcriptome, obtaining high number of complete genes. The high number of duplicated genes can be explained by the presence of assembled alternative transcripts per gene because it was used contig sequences instead of ORFs, and also by the strong heterozygosity; as it is reported in previous studies (Haak et al., 2018; Visser et al., 2015). Similarly, the majority of predicted proteins belong to a certain orthologous group from A. comosus, O. sativa and S. bicolor. The small number of OGs only identified in V. carinata is incomplete or composed by hypothetical proteins. In both cases, the results provide an important validation of the accuracy and completeness of the assembly.