NIH Human Microbiome Project 2


Following a July 2010 16S data freeze, data was downloaded from NCBI SRA projects SRP002395: Human Microbiome Project 16S rRNA Clinical Production Phase I, and SRP002012: Human Microbiome Project 454 Clinical Production Pilot. This dataset corresponds to over 5,700 samples and over 10,000 sequence preps. 16S variable region 3-5 (V35) was sequenced for the entire set of samples, and variable regions 1-3 (V13) and 6-9 (V69) for a subset of samples.

A 16S data processing pipeline was implemented using the mothur software package, using both a high and low stringency approach. The high stringency approach provides an output with more aggressive sequence error reduction tailored towards Operational Taxonomic Unit (OTU) construction, while the low stringency approach favors longer read lengths tailored towards taxonomic classification. The mothur output from both high and low stringency approaches is available here, for all three 16S variable regions analyzed. Descriptions of the file types can be found in the file format readme available for each of the two approaches. We also provide the reference alignments and training sets required to replicate these processes. See the mothur SOP below and Schloss, Gevers and Westcott (2011) for more information.

If you're interested in joint analysis of 16S and shotgun metagenomic datasets from the HMP, pairing up data from the same microbiome samples can initially seem tricky. The HMP Sample Flow Schematic indicates how these sample IDs are related experimentally, and provides tables joining 16S dataset "SN" and "PSN" identifiers with metagenomic dataset "SRS" identifiers.