towards a richer set of information to describe our complete genome collection

Habitat-Lite

From Genomic Standards Consortium

Main Page->EnvO Project - Inaugural EnvO Workshop - Second EnvO CSHL workshop - GSC EnvO Case Study


NEWS: The RDP is running a user survey on habitat terms that are most important to users. Click on the RDP website and the quick survey will automatically launch in your browser: http://rdp.cme.msu.edu


NEWS: Habitat-Lite paper available: Pubmed


Towards a consensus-driven Habitat-Lite: a short list of high-level terms for describing habitat


On this page:

[edit] Introduction

The GSC is interested in the description of 'sample' including the habitat. We are therefore exploring the creation of a limited list of terms for describing habitat.


Increasingly, short lists of habitat terms are being used to annotate databases and undertake a variety of analyses. These lists are continuously being developed a new because there is not yet a central place to put and compare lists.


We are:


1. collecting habitat lists in use within the GSC community


2. determining the overlaps in these lists


3. assessing whether it it feasible to 'merge' (unify) these lists into a single, short list, that might be widely adopted for its broad scope and suitable coverage of a wide range of samples (genomes, metagenomes, 16S, etc).


[edit] List of terms describing habitat

A list of illustrative sources of habitat information is included in the OMICS paper and in this document: http://gensc.org/gc_wiki/index.php/Image:Table_of_habitat_terms.doc

The actual terms will be added to the wiki in the future.

[edit] Habitat-Lite Version 0.1

A first pass version of Habitat-Lite is below. All terms were selected from the Environment Ontology (EnvO) in which the GSC partipates.


Biome

1	freshwater	ENVO:00000873
2	marine	ENVO:00000447
3	terrestrial	ENVO:00000446

Environment of sample (basic descriptors, tailored to current genome and metagenome data sets)

4	soil	ENVO:00001998
5	water	ENVO:00002006
6	air	ENVO:00002005
7	sediment	ENVO:00002007
8	sludge	ENVO:00002044
9	waste water	ENVO:00002007
10	hot spring	ENVO:00000051
11	hydrothermal vent	ENVO:00000215
12	organism-associated	ENVO:00002032
13	extreme environment 	ENVO:00002020
14	food	ENVO:00002002
15	biofilm	ENVO:00002034
16	microbial mat	ENVO:01000008
17	fossil	ENVO:00002164



There is also a .obo file (as a .txt file): Habitat-Lite version 0.1 in .obo format

[edit] Suggest Improvements and issues

Notes following Lynette's first pass search of the "isolation_field" of Genbank documents:


1. We probably need to add "Aquatic" (lumps marine and freshwater but commonly used)

2. "Organism-associated" will be 'host' or 'host-associated' in current records. EnvO simply didn't want to be restrictive as 'host-associated' suggests a pathogen/host relationship (i.e. not symbiote).

3. Conclusions related to this version of Habitat-Lite are as follows (based on Habitat-Lite OMICS paper):

• The set of terms should support certain inferences useful for search; for example, that a sample labeled soil is also terrestrial, or that a sample from a hydrothermal vent is also extreme.

• Consistent annotation requires guidelines for general terms such as terrestrial and aquatic (currently not present in Habitat-Lite), to instruct annotators to annotate to the most specific term possible.

• The notion of extreme environment is problematic in that it should be annotated in addition to a more specific term, such as hot spring – thus requiring that certain entries be associated with two Habitat-Lite terms.

• Organism-associated needs to be sub-divided by linking out to other ontologies or controlled vocabularies (specifically, a taxon hierarchy and perhaps a high level anatomy ontology). • Fossil is an example of a currently infrequently used term, but a candidate “exceptional importance” term that could be useful in the future for searching.

[edit] Open Call for Participation

We are making an open call for evaluation of this list of habitat terms just that we can make a consensus-driven version of this list that best suits community needs. These terms list would then be implemented in GCDML [ref] and used in the first instance to fill the “Habitat” field of the MIGS compliant Genome Catalogue database (http://gensc.org).

Please post comments into the wiki or write to the lead author of this project: Lynette Hirschman, lynette@mitre.org

Loading...