Telecon: 2008 03 25
From Genomic Standards Consortium
This telecon is dedicated to discuss the development of GCDML.
On this page: |
[edit] Time & Call-in Number
Please use the numbers below to call in:
In the UK: Dial in local number 0870 240 7821
In Germany dial in 00 44 808 100 5145
In the US: dial in 011 44 808 100 5145
Participant Code 45707 340 then #
[edit] Topics
Unordered list of discussion points:
- Announcement: GCDML new manuscript version available
- Question: CAMERA example?
- Outline of all possible ways to extend GCDML for maximum contextual data from other databases.
- What are the requirements for including Genome Revies etc.?
- First list of new descriptors
- GCDML and FUGE
- Set of milestones
- Time plan for this year
- MINIMESS meeting in Bremen
- 6th workshop with two GCDML developer days???
- Renaming of nasReports to MIGSReports
- Integration of EnvO → Term Board → Updating
- To SAWSDL or not to SAWSDL?
[edit] Outline of options for maximum reporting from different ources
Current simplified MIGS/MIMS compliant report structure:
<nasReports>
<_Report>
<gcatID />
<studyData>
<extension />
</studyData>
<originalSample>
<samplingTime />
<_SampleLocation>
<extension />
<_SampleLocation>
<_Habitat >
<extension />
</_Habitat >
<extension/>
</originalSample>
<isolate>
<extension />
</isolate>
<dnaExtract>
<extension />
</dnaExtract>
<dnaLibrary>
<extension />
</dnaLibrary>
<sequencing>
<extension />
</sequencing>
<extension />
</_Report>
</nasReports>
[edit] Approach 1
Each source of reports e.g. CAMERA, Genome Reviews, GOLD etc indicates the origin with an attribute. Then two reports with the same GCAT_ID, but different sources would be comparable. In others words then only the combination of source and GCAT_ID uniquely identifies a report.
<nasReports> <_Report source="GOLD"> <gcatID /> <studyData> <extension /> </studyData> <originalSample> <samplingTime /> <_SampleLocation> <extension /> <_SampleLocation> <_Habitat > <extension /> </_Habitat > <extension/> </originalSample> <isolate> <extension /> </isolate> <dnaExtract> <extension /> </dnaExtract> <dnaLibrary> <extension /> </dnaLibrary> <sequencing> <extension /> </sequencing> <extension /> </_Report> <_Report source="GENOME Reviews"> <gcatID /> <studyData> <extension /> </studyData> <originalSample> <samplingTime /> <_SampleLocation> <extension /> <_SampleLocation> <_Habitat > <extension /> </_Habitat > <extension/> </originalSample> <isolate> <extension /> </isolate> <dnaExtract> <extension /> </dnaExtract> <dnaLibrary> <extension /> </dnaLibrary> <sequencing> <extension /> </sequencing> <extension /> </_Report> </nasReports>
[edit] Variation
It could be indicated that the list of reports should refer to the same (meta)genome entity but come from different source:
<nasReports> <comparison> <_Report source="GOLD"> <gcatID /> <studyData> <extension /> </studyData> <originalSample> <samplingTime /> <_SampleLocation> <extension /> <_SampleLocation> <_Habitat > <extension /> </_Habitat > <extension/> </originalSample> <isolate> <extension /> </isolate> <dnaExtract> <extension /> </dnaExtract> <dnaLibrary> <extension /> </dnaLibrary> <sequencing> <extension /> </sequencing> <extension /> </_Report> <_Report source="GENOME Reviews"> <gcatID /> <studyData> <extension /> </studyData> <originalSample> <samplingTime /> <_SampleLocation> <extension /> <_SampleLocation> <_Habitat > <extension /> </_Habitat > <extension/> </originalSample> <isolate> <extension /> </isolate> <dnaExtract> <extension /> </dnaExtract> <dnaLibrary> <extension /> </dnaLibrary> <sequencing> <extension /> </sequencing> <extension /> </_Report> <comparison> </nasReports>
[edit] Approach 2
Each descriptor where comparability is wanted can be repeated with different source attributes values.
<nasReports>
<_Report>
<gcatID />
<studyData>
<extension />
</studyData>
<originalSample>
<samplingTime />
<_SampleLocation>
<extension />
<_SampleLocation>
<_Habitat >
<extension />
</_Habitat >
<extension/>
</originalSample>
<isolate source="staininfo" >
<extension />
</isolate>
<isolate source="DSMZ" >
<extension />
</isolate>
<dnaExtract>
<extension />
</dnaExtract>
<dnaLibrary>
<extension />
</dnaLibrary>
<sequencing>
<extension />
</sequencing>
<extension />
</_Report>
</nasReports>
[edit] Approach 3
Combination of Approach 1 and Approach 2
[edit] Participants
- Sean Murphy
- Tanya Gray
- Renzo Kottmann
[edit] Notes
Discussion of source attribute
Agree that there is a need for a URI to resolve source, and allow source to be person, organisation, as well as genomic database.
Version
uniquely identifier a genome report:
- gcat id/entifier
- source - URI
- version of the report
Solution for identifying unique reports, also needs to handle situation where individual reports from a given source are extracted.
GCDML / FUGE discussion
Criticism:
FUGE makes it less flexible.
With FUGE, you need to use a whole stack of technologies.
In general - thoughts on using UML model as source for Java objects, SQL etc
- never can generate SQL from UML as you want it - always have custom needs
The GCDML standard itself - there is a need for it. In practise it is going to be challenging. Most of effort to implement is related to development of software. Standard is unlikely to be adopted unless there is a standard set of tools available to transform data for databases etc. Would FUGE help?
Implementation of GCDML
Who are the customers? How are they going to generate GCDML documents. Are they just going to use repositories to access data?
Report Versioning
A GCDML file will have a version number as a whole.
In addition, each data provider/source will manage version numbers for sub-reports from individual sources.
Timeplan
make everyone aware of what is happening.
MINIMESS meeting with Jeroen, discuss how to integrate MINIMESS in GCDML. Also how to start discussion how to make it an extension of MIGS/MIMS
Idea to have a workshop in August in Michigan hosted by Garrity and Cole.
Have two days dedicated just to GCDML development.
Release schedule
Suggestion to have fixed regular release times. Benefits end user and allows time for input.
Good item for discussion.
SAWSDL
Ontology integration