towards a richer set of information to describe our complete genome collection

GCDML

From Genomic Standards Consortium

MENU: MainPage -> GSC_project_workspaces

GCDML Logo


The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the “Minimum Information about a Genome Sequence” (MIGS) specification and its extension, the “Minimum Information about a Metagenome Sequence” (MIMS).


On this page:

[edit] Contextual Data

The set of metadata describing aspects of genomes and metagenomes such as geographic location and habitat type from which the sample was taken as well as the details of the processing of a sample, from the time of sampling up to sequencing, and the subsequent analysis is in the focus of GCDML.

This suite of metadata is collectively referred to here as contextual data.

[edit] Aims of GCDML

In overview, MIGS/MIMS will be central to GCDML and GCDML will provide the GSC’s official implementation of the checklist. Beyond, the minimum descriptors of MIGS/MIMS, GCDML will be open and extensible to evolve with the needs of the community.

[edit] From Minimum Reporting...

The first step of this international community has been to define the “Minimum Information about a Genome Sequence” (MIGS) and “Minimum Information about a Metagenome Sequence” (MIGS) specifications. Use of MIGS/MIMS will provide a mechanism for capturing a consensus-driven minimum set of metadata describing aspects of genomes and metagenomes such as geographic location and habitat type from which the sample was taken as well as the details of the sequencing method used.


[edit] ... To Maximum Reporting

It is the aim of the GSC to provide support for the richer capture of contextual data describing genomes and metagenomes by developing the Genomic Contextual Data Markup Language (GCDML). The support of maximum reporting of such projects, though, will require a much richer set of descriptors. Such descriptors must cover both the origin and processing of a sample, from the time of sampling up to sequencing, and the subsequent analysis.

GCDML seeks to specifically support ‘maximal’ reporting of contextual data and the desire of groups in the GSC to include more descriptors beyond the minimal MIGS/MIMS.

[edit] Using XML for Modeling Contextual Data

GCDML is implemented using XML Schema. GCDML aims to take full advantage of the benefits of an XML representation of genomic contextual data. XML provides a machine readable representation of metadata that facilitates the capture, exchange and comparison of large amount of data. XML is widely used to build data capture and exchange format.

[edit] GCDML Manuscript

A draft for the OMICS special issue was submitted and gives further details on the scope and general design decisions of GCDML.

You can download the draft from here

Any feedback is welcome and collected still can be incorporated during the proof stages.

[edit] Contact

[edit] Documentation

Documentation is available as:

HTML: http://gensc.sourceforge.net/gcdml/1.6.0/doc

PDF: http://gensc.sourceforge.net/gcdml/1.6.0/doc/gcdml_single_doc.pdf

Docbook5: http://gensc.sourceforge.net/gcdml/1.6.0/doc/gcdml_single_doc.xml

Documentation is work in progress and not up to date...

[edit] General Feedback

There is a particular need for assistance from those who are familiar with descriptions of organelles, plasmids, and viruses. If you are interested to do this, please contact Renzo, tgra or Dawn.

At this time, feedback is sought on the following aspects of the schema:

In general:

  • documentation for the elements
  • naming of elements
  • mis-spelling
  • are any elements missing?

GML (Geography Markup Language):

  • are there things missing?
  • Could it be streamlined?

Genome reports:

  • There is now an opportunity to use many different types of descriptors. Given

this, are there descriptors that you use for your data that are missing?


There already happen to be several telecons including experts on validation of GCDML in the past

[edit] Development

Everyone interested in GCDML is invited to join the development efforts! Just drop us a mail.

Another means of participation is joining telecons. Announcements take place on the GSC mailing lists.

[edit] Proposals

Another means of GCDML development is based on proposals and votes. See proposal pages and other GCDML pages for more details.

[edit] Subversion repository

Current repository location: https://gensc.svn.sourceforge.net/svnroot/gensc/schema/gcdml/tags/gcdml-1.6.0

Only commit changed schema documents which validate with at least one XML Schema parser.

[edit] Bug reports and Feature requests

Bugs can be reported at: http://sourceforge.net/tracker/?group_id=153365&atid=787475

Missing features can be announced at: http://sourceforge.net/tracker/?group_id=153365&atid=787478

[edit] Releases

[edit] Release 1.6.0

This release marks several major improvements since the last workshop. New features include:

  • Two kinds of reports are available now
    • MIGSReports: these reports implement the MIGS/MIMS checklist only as published in Nature Biotechnology
    • GCDReports: these reports allow to supply additional information which is not covered in MIGS/MIMS checklist but is MIGS/MIMS compliant
  • Implement Habitat-Lite
  • Detailed use of controlled vocabulary for units of measurement
  • Improved documentation within schema and in docbook
  • Bugfixes

It is recommended to upgrade to this schema version.

[edit] Schema versioning

The following section is the output of discussions held at the EBI on 10th December 2007 with Peter Sterk, Renzo Kottmann and Tanya Gray


Report ownership schema versioning requires permission from report authors to update instances

Should reports all be compliant with most recent schema?

  • advantage for reports to be compliant with most recent schema version – will never skip need to update instance to new version


major and minor revision

  • decide on what is major and minor
  • minor revision would be addition of enumeration – do not change schema version number if just terms change
  • each change to the schema that does not change old instances (re-factoring) should not versioned

announce version change resulting from addition of enumerations via email, but would not change the schema version attribute value


schema version attribute use the schema version attribute – one way to announce schema version change


schema namespace url

-- agreement to use http://gensc.org for GCDML namespace URL

gencat schema versioning

gencat implements schema versioning - the version numbers are incremental and do not relate to the schema file version number. schemas can be retrieved using the url+version number e.g. http://gensc.org/gsc/xsd/1. requirement to review gencat schema archive/versioning to see if meets future requirements


flow diagram

flow diagram to describe schema versioning, also how it relates to term submission

Loading...