towards a richer set of information to describe our complete genome collection

GSC EnvO Case Study

From Genomic Standards Consortium

Main Page->EnvO Project - Inaugural EnvO Workshop - Second EnvO CSHL workshop


On this page:

[edit] The GSC EnvO Case Study

The beginning of the GSC EnvO Case Study was presented at the inaugural EnvO Workshop meeting in Oxford. In brief, terms from 6 resources were fed into early version of EnvO. Each resource involves a database/resource run by members of the GSC, all of which already use descriptions of habitat or plan to in the near future.


Initial Presentation: EnvO Workshop


The largest dataset that was considered at the workshop and used to populate gaz and envO was the Genomes Online Database (GOLD). It was decided to work towards annotating a selection of complete, published, bacterial genomes. This specific case study is described in more detail below.


[edit] Annotating the GOLD Database

  • Dawn Field and Nikos Kyrpides (post workshop) offered to co-ordinate an evaluation of EnvO for use in annotating the GOLD database. Of interest is the 'isolation' column although EnvO could be applied to the "habitat" column and the Gaz.obo ontoloy could be applied to the "Country" column. Chris Mungall offered to configure Phenote, including access to relevant ontologies, for the sake of annotation.


Everyone should use this snap shot file of the first 532 published complete bacterial genomes as of Sept 17th:


As tab delimited spreadsheet: http://darwin.nerc-oxford.ac.uk/gc_wiki/images/9/91/GSC_snapshot_of_gold_complete_bacterial_genomes_Sept17.txt

Taken from GOLD Table: http://www.genomesonline.org/gold.cgi?want=Published+Complete+Genomes


GOLD in Phenote

Chris Mungall has configured Phenote to work with a trimmed down version of the GOLD snapshot above.


From Chris: Pre second EnvO workshop draft annotation file: http://gensc.org/gc_wiki/index.php/Image:Test_annotation_of_GOLD_version_1.txt



http://www.phenote.org/ 
choose "launch phenote via webstart"

select "envo_gold" as your config from the phenote menu

wait a wee while (the first time is slow)

load the attached annotation file.

The files HABITAT_DESC, ISOLATION_DESC came from the gold  
spreadsheet. The fields Habitat and Isolation are from Envo and/or  
other ontologies. I have annotated a few of them, most are blank.

COUNTRY is from the source file, Location is from gaz. Again, i only  
annotated a few.

I dropped a bunch of fields from the envo file that seemed redundant.

I used post-composition with FMA for isolation environments like  
tubercular lung. These aren't ontologically correct as I am saying  
"lung that derives_from tuberculosis"; this is just intended as a  
sample until the correct relation is in place.

I also used a few FMA terms like Alimentary system for Bovine rumen -  
again obviously incorrect.

Chris


[edit] Towards an EnvO-Lite for genomes and metagenomes

Dawn Field and Lynette Hirschman offered to explore the creation of an "EnvO-Lite" view of the ontology suitable for broad annotations of genomes and metagenomes. This list of terms is to be evaluated in the first instance by NCBI in the development of their metagenomic database and by Renzo Kottman for use in GCDML. The aim is to select no more than 20 descriptors (selected from different levels of the ontology as most relevant to currently completed genomes and metagenomes).

Draft list of Terms in EnvO-Lite


[edit] Towards the description of microbial isolates in culture collections

Peter Dawyndt will examine the use of descriptors of habitat from a range of culture collections unified through the StrainInfo.net portal. -

Loading...