GCDML
From Genomic Standards Consortium
MENU: MainPage -> GSC_project_workspaces
The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the “Minimum Information about a Genome Sequence” (MIGS) specification and its extension, the “Minimum Information about a Metagenome Sequence” (MIMS).
On this page: |
[edit] Contextual Data
The set of metadata describing aspects of genomes and metagenomes such as geographic location and habitat type from which the sample was taken as well as the details of the processing of a sample, from the time of sampling up to sequencing, and the subsequent analysis is in the focus of GCDML.
This suite of metadata is collectively referred to here as contextual data.
[edit] Aims of GCDML
In overview, MIGS/MIMS will be central to GCDML and GCDML will provide the GSC’s official implementation of the checklist. Beyond, the minimum descriptors of MIGS/MIMS, GCDML will be open and extensible to evolve with the needs of the community.
[edit] From Minimum Reporting...
The first step of this international community has been to define the “Minimum Information about a Genome Sequence” (MIGS) and “Minimum Information about a Metagenome Sequence” (MIGS) specifications. Use of MIGS/MIMS will provide a mechanism for capturing a consensus-driven minimum set of metadata describing aspects of genomes and metagenomes such as geographic location and habitat type from which the sample was taken as well as the details of the sequencing method used.
[edit] ... To Maximum Reporting
It is the aim of the GSC to provide support for the richer capture of contextual data describing genomes and metagenomes by developing the Genomic Contextual Data Markup Language (GCDML). The support of maximum reporting of such projects, though, will require a much richer set of descriptors. Such descriptors must cover both the origin and processing of a sample, from the time of sampling up to sequencing, and the subsequent analysis.
GCDML seeks to specifically support ‘maximal’ reporting of contextual data and the desire of groups in the GSC to include more descriptors beyond the minimal MIGS/MIMS.
[edit] Using XML for Modeling Contextual Data
GCDML is implemented using XML Schema. GCDML aims to take full advantage of the benefits of an XML representation of genomic contextual data. XML provides a machine readable representation of metadata that facilitates the capture, exchange and comparison of large amount of data. XML is widely used to build data capture and exchange format.
[edit] GCDML Manuscript
A draft for the OMICS special issue was submitted and gives further details on the scope and general design decisions of GCDML.
You can download the draft from here
- PROOFS: GCDML
Any feedback is welcome and collected still can be incorporated during the proof stages.
[edit] Contact
- You can mail to gensc-gcdml at lists.sourceforge.net for specific GCDML topics.
- To follow the discussions subscribe to https://lists.sourceforge.net/lists/listinfo/gensc-gcdml
- Mail archives of gensc-gcdml mailing list
- Participation in telecons is open to everybody interested
[edit] Documentation
Documentation is available as:
HTML: http://gensc.sourceforge.net/gcdml/1.6.0/doc
PDF: http://gensc.sourceforge.net/gcdml/1.6.0/doc/gcdml_single_doc.pdf
Docbook5: http://gensc.sourceforge.net/gcdml/1.6.0/doc/gcdml_single_doc.xml
Documentation is work in progress and not up to date...
[edit] General Feedback
There is a particular need for assistance from those who are familiar with descriptions of organelles, plasmids, and viruses. If you are interested to do this, please contact Renzo, tgra or Dawn.
At this time, feedback is sought on the following aspects of the schema:
In general:
- documentation for the elements
- naming of elements
- mis-spelling
- are any elements missing?
GML (Geography Markup Language):
- are there things missing?
- Could it be streamlined?
Genome reports:
- There is now an opportunity to use many different types of descriptors. Given
this, are there descriptors that you use for your data that are missing?
There already happen to be several telecons including experts on validation of GCDML in the past
[edit] Development
Everyone interested in GCDML is invited to join the development efforts! Just drop us a mail.
Another means of participation is joining telecons. Announcements take place on the GSC mailing lists.
[edit] Proposals
Another means of GCDML development is based on proposals and votes. See proposal pages and other GCDML pages for more details.
[edit] Subversion repository
Current repository location: https://gensc.svn.sourceforge.net/svnroot/gensc/schema/gcdml/tags/gcdml-1.6.0
Only commit changed schema documents which validate with at least one XML Schema parser.
[edit] Bug reports and Feature requests
Bugs can be reported at: http://sourceforge.net/tracker/?group_id=153365&atid=787475
Missing features can be announced at: http://sourceforge.net/tracker/?group_id=153365&atid=787478
[edit] Releases
[edit] Release 1.6.0
This release marks several major improvements since the last workshop. New features include:
- Two kinds of reports are available now
- MIGSReports: these reports implement the MIGS/MIMS checklist only as published in Nature Biotechnology
- GCDReports: these reports allow to supply additional information which is not covered in MIGS/MIMS checklist but is MIGS/MIMS compliant
- Implement Habitat-Lite
- Detailed use of controlled vocabulary for units of measurement
- Improved documentation within schema and in docbook
- Bugfixes
It is recommended to upgrade to this schema version.
[edit] Schema versioning
The following section is the output of discussions held at the EBI on 10th December 2007 with Peter Sterk, Renzo Kottmann and Tanya Gray
Report ownership
schema versioning requires permission from report authors to update instances
Should reports all be compliant with most recent schema?
- advantage for reports to be compliant with most recent schema version – will never skip need to update instance to new version
major and minor revision
- decide on what is major and minor
- minor revision would be addition of enumeration – do not change schema version number if just terms change
- each change to the schema that does not change old instances (re-factoring) should not versioned
announce version change resulting from addition of enumerations via email, but would not change the schema version attribute value
schema version attribute
use the schema version attribute – one way to announce schema version change
schema namespace url
-- agreement to use http://gensc.org for GCDML namespace URL
gencat schema versioning
gencat implements schema versioning - the version numbers are incremental and do not relate to the schema file version number. schemas can be retrieved using the url+version number e.g. http://gensc.org/gsc/xsd/1. requirement to review gencat schema archive/versioning to see if meets future requirements
flow diagram
flow diagram to describe schema versioning, also how it relates to term submission