towards a richer set of information to describe our complete genome collection

Telecon: 2008 03 25

From Genomic Standards Consortium

This telecon is dedicated to discuss the development of GCDML.

On this page:

[edit] Time & Call-in Number

http://www.timeanddate.com/worldclock/meetingdetails.html?year=2008&month=3&day=25&hour=16&min=0&sec=0&p1=37&p2=224&p3=179

Please use the numbers below to call in:

In the UK: Dial in local number 0870 240 7821

In Germany dial in 00 44 808 100 5145

In the US: dial in 011 44 808 100 5145

Participant Code 45707 340 then #



[edit] Topics

Unordered list of discussion points:

  • Announcement: GCDML new manuscript version available
  • Question: CAMERA example?
  • Outline of all possible ways to extend GCDML for maximum contextual data from other databases.
    • What are the requirements for including Genome Revies etc.?
    • First list of new descriptors
  • GCDML and FUGE
  • Set of milestones
  • Time plan for this year
    • MINIMESS meeting in Bremen
    • 6th workshop with two GCDML developer days???
  • Renaming of nasReports to MIGSReports
  • Integration of EnvO → Term Board → Updating
    • To SAWSDL or not to SAWSDL?

[edit] Outline of options for maximum reporting from different ources

Current simplified MIGS/MIMS compliant report structure:

<nasReports>
   <_Report>
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
</nasReports>


[edit] Approach 1

Each source of reports e.g. CAMERA, Genome Reviews, GOLD etc indicates the origin with an attribute. Then two reports with the same GCAT_ID, but different sources would be comparable. In others words then only the combination of source and GCAT_ID uniquely identifies a report.

<nasReports>
   <_Report source="GOLD">
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
   <_Report source="GENOME Reviews">
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
</nasReports>

[edit] Variation

It could be indicated that the list of reports should refer to the same (meta)genome entity but come from different source:

<nasReports>
  <comparison>
   <_Report source="GOLD">
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
   <_Report source="GENOME Reviews">
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
  <comparison>
</nasReports>


[edit] Approach 2

Each descriptor where comparability is wanted can be repeated with different source attributes values.


<nasReports>
   <_Report>
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       
       <isolate source="staininfo" >
           <extension />
       </isolate>
       <isolate source="DSMZ" >
           <extension />
       </isolate>
       
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
</nasReports>


[edit] Approach 3

Combination of Approach 1 and Approach 2


[edit] Participants

  • Sean Murphy
  • Tanya Gray
  • Renzo Kottmann



[edit] Notes

Discussion of source attribute

Agree that there is a need for a URI to resolve source, and allow source to be person, organisation, as well as genomic database.


Version

uniquely identifier a genome report:

  • gcat id/entifier
  • source - URI
  • version of the report

Solution for identifying unique reports, also needs to handle situation where individual reports from a given source are extracted.


GCDML / FUGE discussion

Criticism:

FUGE makes it less flexible.

With FUGE, you need to use a whole stack of technologies.


In general - thoughts on using UML model as source for Java objects, SQL etc

  • never can generate SQL from UML as you want it - always have custom needs


The GCDML standard itself - there is a need for it. In practise it is going to be challenging. Most of effort to implement is related to development of software. Standard is unlikely to be adopted unless there is a standard set of tools available to transform data for databases etc. Would FUGE help?

Implementation of GCDML

Who are the customers? How are they going to generate GCDML documents. Are they just going to use repositories to access data?


Report Versioning


A GCDML file will have a version number as a whole.

In addition, each data provider/source will manage version numbers for sub-reports from individual sources.


Timeplan

make everyone aware of what is happening.

MINIMESS meeting with Jeroen, discuss how to integrate MINIMESS in GCDML. Also how to start discussion how to make it an extension of MIGS/MIMS

Idea to have a workshop in August in Michigan hosted by Garrity and Cole.

Have two days dedicated just to GCDML development.


Release schedule


Suggestion to have fixed regular release times. Benefits end user and allows time for input.

Good item for discussion.


SAWSDL

Ontology integration

Loading...