David T. Pride and Thomas Schoenfeld
Abstract
Background: Metagenomic analysis provides a rich source of biological information for otherwise
intractable viral communities. However, study of viral metagenomes has been hampered by its
nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to
develop algorithms for examination of viral metagenomes to identify the origin of sequences
independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs,
Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial
populations where comparatively large contigs were obtained. Thermal spring metagenomes have
high proportions of sequences without significant Genbank homology, which has hampered
identification of viruses and their linkage with hosts. To analyze each metagenome, we developed
a method to classify DNA fragments using genome signature-based phylogenetic classification
(GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures
for all previously sequenced Bacteria, Archaea, and viruses.
Results: From both Bear Paw and Octopus hot springs, each assembled contig had more similarity
to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis,
suggesting a genome signature common to each of these extreme environments. While viral
metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each
locale are largely unique. GSPC using a microbial database predicts most of the Octopus
metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding
consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus
metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while
none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when
microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic
contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with
the apparent viral origin of both metagenomes.
Conclusion: That BLAST searches identify no significant homologs for most metagenome contigs,
while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a
complementary approach in viral metagenomic analysis.
NOTE: the article text supplied here is for educational purposes only.
*Don't have Adobe Reader?
Get the latest version.
NOTE: Some versions of Adobe Reader have problems with Google Chrome. Either resize the browser to view the paper or enable
the Chrome internal PDF viewer by entering chrome://plugins in your address bar and clicking enable for the Chrome PDF Viewer plugin.