2018-01-29
Version 15.1 Release Notes
HOMD 16S rRNA RefSeq Version 15.1 is a major update and an expansion of the microbial taxa to include species identified in human sinonasal cavities.
The expanded version of the HOMD (eHOMD) contains a total of 772 human oral/nasal taxa, of which 693 are oral, 89 nasal (10 are both oral and nasal taxa).
This version of the 16S rRNA RefSeq contains a total of 998 full length 16S rDNA sequences representing 769 taxa (sequences of 3 taxa are not yet available).
The reference sequences are available for search with the "Identity 16S rRNA Sequence" BLAST tool, and are also available for download in the following formats:
HOMD_16S_rRNA_RefSeq_V15.1.fasta - unaligned sequences starting from position 28
HOMD_16S_rRNA_RefSeq_V15.1.p9.fasta - unaligned sequences starting from position 9
HOMD_16S_rRNA_RefSeq_V15.1.aligned.fasta - aligned sequences starting from position 9
The taxonomy information is provided in the following two formats:
HOMD_16S_rRNA_RefSeq_V15.1.qiime.taxonomy - QIIME taxonomy format to be included in the QIIME pipeline
HOMD_16S_rRNA_RefSeq_V15.1.mothur.taxonomy - MOTHUR format for use with the MOTHUR package
A phylogenetic tree of all the sequences is also available for viewing and download in both newick and SVG (scalable vector graphics):
HOMD_16S_rRNA_RefSeq_V15.1.tre
HOMD_16S_rRNA_RefSeq_V15.1.svg
All the data can be downloaded from the
HOMD FTP site.
2017-01-03
Version 14.51 Release Notes
Version 14.51 is a minor update of version 14.5 with only naming modification for two taxa.
1. HOT-279 has been formally named as Porphyromonas pasteri. This taxon was previously unnamed Porphyromonas sp. oral taxon 279. A total of 6 reference sequences representing this taxon were affected. Their IDs are 279CW034, 279DP023, 279F450a, 279F450b, 279F450c, and 279F450d
2.HOT-659 was renamed to Mesorhizobium loti from previous Rhizobium loti. only one sequence (ID: 659_0166) was affected.
There is no modification of the sequences.
2016-03-29
Version 14.5 Release Notes
Version 14.5 is a major update (versions 14.1 to 14.4 were used internally and not publically released). The revisions are not just to 16S rRNA Reference Sequences, but to HOMD and its provisional taxonomic structure. The accompanying Excel file (click to download) details the many added and deleted taxa and changes in status (Named, Unnamed, Phylotype). Twenty-seven previously overlooked or newly named oral taxa were added. Thirty taxa were deleted as their sequences were chimeric or damaged. In most cases, chimeras deleted were derived from crossover of sequences within a genus. Fourteen non oral taxa were added so that there would be reference genomes in phyla with no or few oral representatives (Chlorobi, Chloroflexi, GN02, TM7, SR1, WPPS-2 ). Genomes are critical for metagenomic, transcriptomic and proteomic studies where you only see sequences that can be mapped back to reference genomes. r In this update, reference sequences were compared to all those in GenBank for the same taxa. Reference sequences that had several mismatches to other sequences in a taxon were replaced by a higher quality sequence that was representative of the taxon. When sequences within a taxon fall into multiple subgroups, RefSeqs representing divergent subgroups were added. Some changes to HOMD taxonomy are not included in the Excell file, such as changes in several class names which now in in “ia” (previously phylum and class names were the same for many taxa, but now we have Spirochaetes and Spirochaetia). A number of taxa which were previously uncultured Phylotypes, have been cultured and are now designated as Unnamed. A number of Unnamed taxa have been named and their status has been changed to Named. The headers for the RefSeqs contain the following information: File identifier; Name; HOT-ID; Clone or Strain #; GenBank accession #; Status (Named/Unnamed/Phylotype/Lost); Genome status (G=yes, X= no); log +1 of sequences seen in a study of 27 subjects at 9 oral sites (14 million sequences).
Previous versions:
HOMD provides two different sets of 16S rRNA Gene Reference Sequence (RefSeq) for download and BLAST search:
1. HOMD 16S rRNA RefSeq: This set contains sequences representing all currently named and unnamed oral taxa.
2. HOMD 16S rRNA Extended RefSeq: This set contains additional16S rRNA reference gene sequences that are distinctively different from existing taxa but have not yet been assigned with a taxon ID.
These sequences are corrected consensus sequences. Many have been corrected and extended based on alignment with other sequences for that taxon and Ns and indels removed. Therefore, for many sequences, there will be differences between the Reference Sequence and the GenBank sequence listed in the header information. We have not yet updated our own GenBank sequences, and can not update those from other depositors. We believe these are currently the bestreference sequences available, and for the purposes of BLAST analysis, have the advantage of being of a uniform length.
HOMD 16S rRNA RefSeq version history:
2013-05-08:
Version 13.2 Release Notes
Version 13.2 is a minor correction of 13.0 due to two duplicated reference sequences. Detail changes were shown in the Excel file HOMD version 13.2 [Download the Excel File Here].
We now also included an option to search against these reference sequences that include the forward (5') primer sequences:
1. HOMD 16S rRNA RefSeq Version 13.2: This is the default version selected for search. The sequences of this set start at 16S rRNA position 28 (thus without the forward primer sequences)
2. HOMD 16S rRNA RefSeq Version 13.2 (Pos 9): This set of sequences start at the position 9, thus they include the forward primer sequences.
In addition, we are also providing the following additional files:
1. Aligned sequences arranged in phylogenetic orders. [Download].
2. FASTA sequences in mothur format. [Download].
3. Taxonomy file required by mothur. [Download].
All of these files can also be downloaded from the web download page or HOMD FTP site.
2013-03-25:
Version 13.0 Release Notes
Cumulative additions and corrections to the HOMD Taxon Table and Taxon Description pages have been made since last update. These changes were shown in the Excel file HOMD version 13.0 [Download the Excel File Here].
The default 16S rRNA reference set has been updated to version 13.0 The Extended set which includes the provisional taxa A00-H067 has not yet been updated and therefore should be used with caution.
2013-01-16:
Version 12.0 Release Notes
Cumulative additions and corrections to the HOMD Taxon Table and Taxon Description pages have been made since last update. These changes were shown in the Excel file HOMD version 12.0 [Download the Excel File Here].
The default 16S rRNA reference set has been updated to version 12.0 The Extended set which includes the provisional taxa A00-H067 has not yet been updated and therefore should be used with caution.
2011-02-16:
Version 11.0 Release Notes
Over the past year, several additions and corrections to the HOMD Taxon Table and Taxon Description pages have been made. These changes shown in the Excel file HOMD version 11.0 [Download the Excel File Here].
In addition, the default 16S rRNA reference set has been updated to version 11.0 The Extended set which includes the provisional taxa A00-H067 has not yet been updated and therefore should be used with caution.
2010-02-17:
Version 10.1
Corrected the following two sequence headers from
>357_8615| Synergistetes [G-2] sp. | Oral Taxon 357 | Clone C2ALM009 | AY278615 | 21 | N
>357W5455| Synergistetes [G-2] sp. | Oral Taxon 357 | Strain W5455 | EU309492 | 21 | N
to:
>357_8615| Pyramidobacter piscolens | Oral Taxon 357 | Clone C2ALM009 | AY278615 | 21 | N
>357W5455| Pyramidobacter piscolens | Oral Taxon 357 | Strain W5455 | EU309492 | 21 | N
2010-02-08:
Version 10.1: Minor modification of HOMD 16S rRNA RefSeq version 10, contains 755 references sequences.
1. Change of sequence header format (first line of the FASTA sequence) to become:
>Sequence ID| Species Name | Oral Taxon Number | Clone Name | Genbank Access Number | Number of Clones Identified | N/P/U
where N: named species; P: phylotype; U: Un-named species
For example:
>524_3631| Veillonella atypica | Oral Taxon 524 | Clone MB5_P17 | DQ003631 | 208 | N
2. Two additional reference sequences were added:
>357W5455| Synergistetes [G-2] sp. | Oral Taxon 357 | Strain W5455 | EU309492 | 21 | N
>678_4915| Solobacterium moorei | Oral Taxon 678 | Strain AHP 13983 | AY044915 | 20 | N
3. Changes of two sequence IDs:
660_2378 -> 660_5312
655_8120 -> 655_9120
2009-02-03:
Version 10: First public release of HOMD 16S rRNA RefSeq, constaining 753 reference sequences
HOMD 16S rRNA Extended RefSeq version history:
2010-02-19:
Version 1.1 - A total of 1647 reference sequences including all 755 sequences from 10.1 non-extended version, and additional 892 new sequences that have yet to be assigned with a formal oral taxon. The header lines have been changed to a format consistent to the non-standard version 10.1.
2009-02-25:
Version 1: First public release of HOMD 16S rRNA Extended RefSeq, containing 1726 reference sequences, including all 753 in the non-extended version above.
HOMD 16S rRNA gene sequence clonal collection version history:
2010-02-07:
Version 1.1 - A total of 34,879 cloned rDNA sequences which were collected over the years by the HOMD research group. These sequences have been recently deposited to NCBI Genbank. Here we provided the complete collection retrieved from Genbank in FASTA format. In addition to the GI and Genbank accession number, an internal HOMD sequence ID has also be added to the header line for each sequence (highlighted in red as shown in the example header line below:
>gi|285159138|gb|GU397556.1|AW149W| Caulobacter sp. oral taxon 002 clone AW149 16S ribosomal RNA gene, partial sequence