GSC EnvO Case Study
From Genomic Standards Consortium
Main Page->EnvO Project - Inaugural EnvO Workshop - Second EnvO CSHL workshop
On this page: |
[edit] The GSC EnvO Case Study
The beginning of the GSC EnvO Case Study was presented at the inaugural EnvO Workshop meeting in Oxford. In brief, terms from 6 resources were fed into early version of EnvO. Each resource involves a database/resource run by members of the GSC, all of which already use descriptions of habitat or plan to in the near future.
Initial Presentation: EnvO Workshop
The largest dataset that was considered at the workshop and used to populate gaz and envO was the Genomes Online Database (GOLD). It was decided to work towards annotating a selection of complete, published, bacterial genomes. This specific case study is described in more detail below.
[edit] Annotating the GOLD Database
- Dawn Field and Nikos Kyrpides (post workshop) offered to co-ordinate an evaluation of EnvO for use in annotating the GOLD database. Of interest is the 'isolation' column although EnvO could be applied to the "habitat" column and the Gaz.obo ontoloy could be applied to the "Country" column. Chris Mungall offered to configure Phenote, including access to relevant ontologies, for the sake of annotation.
Everyone should use this snap shot file of the first 532 published complete bacterial genomes as of Sept 17th:
As tab delimited spreadsheet: http://darwin.nerc-oxford.ac.uk/gc_wiki/images/9/91/GSC_snapshot_of_gold_complete_bacterial_genomes_Sept17.txt
Taken from GOLD Table: http://www.genomesonline.org/gold.cgi?want=Published+Complete+Genomes
GOLD in Phenote
Chris Mungall has configured Phenote to work with a trimmed down version of the GOLD snapshot above.
From Chris: Pre second EnvO workshop draft annotation file: http://gensc.org/gc_wiki/index.php/Image:Test_annotation_of_GOLD_version_1.txt
http://www.phenote.org/ choose "launch phenote via webstart" select "envo_gold" as your config from the phenote menu wait a wee while (the first time is slow) load the attached annotation file. The files HABITAT_DESC, ISOLATION_DESC came from the gold spreadsheet. The fields Habitat and Isolation are from Envo and/or other ontologies. I have annotated a few of them, most are blank. COUNTRY is from the source file, Location is from gaz. Again, i only annotated a few. I dropped a bunch of fields from the envo file that seemed redundant. I used post-composition with FMA for isolation environments like tubercular lung. These aren't ontologically correct as I am saying "lung that derives_from tuberculosis"; this is just intended as a sample until the correct relation is in place. I also used a few FMA terms like Alimentary system for Bovine rumen - again obviously incorrect. Chris
[edit] Towards an EnvO-Lite for genomes and metagenomes
Dawn Field and Lynette Hirschman offered to explore the creation of an "EnvO-Lite" view of the ontology suitable for broad annotations of genomes and metagenomes. This list of terms is to be evaluated in the first instance by NCBI in the development of their metagenomic database and by Renzo Kottman for use in GCDML. The aim is to select no more than 20 descriptors (selected from different levels of the ontology as most relevant to currently completed genomes and metagenomes).
Draft list of Terms in EnvO-Lite
[edit] Towards the description of microbial isolates in culture collections
Peter Dawyndt will examine the use of descriptors of habitat from a range of culture collections unified through the StrainInfo.net portal. -