towards a richer set of information to describe our complete genome collection

GCDML

From Genomic Standards Consortium

GCDML Logo


The Genomic Contextual Data Markup Language (GCDML) is a core project of the Genomic Standards Consortium (GSC) that implements the “Minimum Information about a Genome Sequence” (MIGS) specification and its extension, the “Minimum Information about a Metagenome Sequence” (MIMS).

On this page:

[edit] Aims of GCDML

In overview, MIGS/MIMS will be central to GCDML and GCDML will provide the GSC’s official implementation of the checklist. Beyond, the minimum descriptors of MIGS/MIMS, GCDML will be open and extensible to evolve with the needs of the community.

[edit] From Minimum Reporting...

The first step of this international community has been to define the “Minimum Information about a Genome Sequence” (MIGS) and “Minimum Information about a Metagenome Sequence” (MIGS) specifications. Use of MIGS/MIMS will provide a mechanism for capturing a consensus-driven minimum set of metadata describing aspects of genomes and metagenomes such as geographic location and habitat type from which the sample was taken as well as the details of the sequencing method used.


[edit] ... To Maximum Reporting

It is the aim of the GSC to provide support for the richer capture of contextual data describing genomes and metagenomes by developing the Genomic Contextual Data Markup Language (GCDML). The support of maximum reporting of such projects, though, will require a much richer set of descriptors. Such descriptors must cover both the origin and processing of a sample, from the time of sampling up to sequencing, and the subsequent analysis.

GCDML seeks to specifically support ‘maximal’ reporting of contextual data and the desire of groups in the GSC to include more descriptors beyond the minimal MIGS/MIMS.

[edit] What is Contextual Data?

The set of metadata describing aspects of genomes and metagenomes such as geographic location and habitat type from which the sample was taken as well as the details of the processing of a sample, from the time of sampling up to sequencing, and the subsequent analysis is in the focus of GCDML.

This suite of metadata is collectively referred to here as contextual data.


[edit] Using XML for Modeling Contextual Data

GCDML is implemented using XML Schema. GCDML aims to take full advantage of the benefits of an XML representation of genomic contextual data. XML provides a machine readable representation of metadata that facilitates the capture, exchange and comparison of large amount of data. XML is widely used to build data capture and exchange formats.

[edit] GCDML Satellite Meeting 2008

A technical meeting on GCDML was held on Oct 13th and 14th 2008 prior to the main GSC 6 meeting. The co-organizers are Renzo Kottmann, Peter Sterk and Dawn Field.

A livley discussion let to a list of changes to GCDML 1.6.0 which are all incorporated in GCDML version 1.7.0

See GCDML Satellite Meeting 2008 for details

Download agenda (PDF version). and Presentation sildes

[edit] GCDML Publication

A publication in the OMICS special issue was published and gives further details on the scope and general design decisions of GCDML.

To cite this paper:

Renzo Kottmann, Tanya Gray, Sean Murphy, Leonid Kagan, Saul Kravitz, Thierry Lombardot, Dawn Field, Frank Oliver Glockner.
OMICS: A Journal of Integrative Biology. June 1, 2008, 12(2): 115-121. doi:10.1089/omi.2008.0A10.


Any feedback is welcome.

[edit] Contact

[edit] Documentation

The most up-to-date information on GCDML are on the sourceforge web pages:

http://gensc.sourceforge.net/gcdml/

Link to import xsd into e.g. Oxygen XML: http://gensc.sf.net/ns/gcdml/1.7.0/base/gcdml.xsd

[edit] GCDML Development

GCDML is actively developed by members of the GSC further information can be found here.

[edit] Releases

Several releases were already made 1.7.0 is the most recent.

[edit] Release 1.7.0

This release includes

  • All changes revealed and discussed during the 6th GSC Meeting are included in this release.
  • an XSLT update file to transform 1.6.0 report files to 1.7.0
  • JAXB binding files and ant targets to auto generate a Java API for use in other software projects.

[edit] Release 1.6.0

This release marks several major improvements since the last workshop. New features include:

  • Two kinds of reports are available now
    • MIGSReports: these reports implement the MIGS/MIMS checklist only as published in Nature Biotechnology
    • GCDReports: these reports allow to supply additional information which is not covered in MIGS/MIMS checklist but is MIGS/MIMS compliant
  • Implement Habitat-Lite
  • Detailed use of controlled vocabulary for units of measurement
  • Improved documentation within schema and in docbook
  • Bugfixes

Documentation is being written - please give feedback and addition welcome.

Documentation is available as:

HTML: http://gensc.sourceforge.net/gcdml/1.6.0/doc

PDF: http://gensc.sourceforge.net/gcdml/1.6.0/doc/gcdml_single_doc.pdf

Docbook5: http://gensc.sourceforge.net/gcdml/1.6.0/doc/gcdml_single_doc.xml

Documentation is work in progress and not up to date...


It is recommended to upgrade to this schema version.

[edit] GCDML Examples

Phage genomes in GCDML found at the MegX database: http://www.megx.net/gcdml/gcdml.html

Loading...