Biocode Commons GOs Network GSC Project

Biocode Commons – GOs Network GSC project proposal (GSC)

 

Project Title: The Biocode Commons (under the Genomic Observatories Network)

Project Leads John Deck (Berkeley), Neil Davies (Berkeley)

Team members:  Participants welcome, open membership

  • Brett Ammundsen (Biomatters)*
  • Reed Beaman (UF)*
  • Neil Davies (UC Berkeley)*
  • John Deck (UC Berkeley)*
  • Alexei Drummond (University of Auckland)
  • Dawn Field (CEH, University of Oxford)*
  • Rob Guralnick (Univ. Colorado)*
  • Chris Meyer (Smithsonian)
  • Norman Morrison (CEH, Manchester)
  • Renzo Kottmann (MPI-Bremen)
  • Steve Stones-Havas (Biomatters)*
  • Philippe Rocca-Serra (University of Oxford)
  • Susanna Sansone (University of Oxford)
  • Dave Vieglas (DataONE)*
  • Patricia Wecker (CRIOBE)*

*participants in Moorea Genomic Observatory meeting June, 1-5; others have also engaged in discussions related to Biocode Commons at GSC, TDWG, and other meetings and hackathons

Elevator pitch:  Biocode Commons will form a community to build the set of software resources required to support genomic observations from collection through analysis and publication.

Project Summary The Biocode Commons will work to build the interoperable and standards based “informatics stack” for the global network of Genomic Observatories.  The GOs Network will provide the overarching strategic direction for the Biocode Commons and will serve as its major user community (although the scope of the Biocode Commons could expand in the future).

The Biocode Commons was originally conceived as an outgrowth of the Moorea Biocode Project, and specifically from the collaboration between Berkeley and Biomatters.  The desire to have a sustainable home for the open source outcomes of this project led to the conception of the Biocode Commons as part of the emerging Genomic Observatories Network (see Nature communication and longer piece in GigaScience – http://genomicobservatories.blogspot.com/p/publications.html ).

The Biocode Commons will be a community of software developers and users that will support the use of open source software for creating a genomic observations platform that ensures that genomic data can be captured and conforms to global standards (e.g., Darwin Core, Genomic Standards Consortium) at each stage of the value chain from field collection through lab processing, to analysis, publication, databases, and archives  (i.e. BOLD, INSDC, museums, etc.)

The Biocode Commons will also think about stewardship of data from GOs and beyond by working to develop/adopt tools for identifiers and resolution.  A first major focus is to support access to “linked data” from across the genomic observations network, making sure that the outputs of workflows, data from distributed genomic databases and associated metadata are all appropriately accessible.  Biocode Commons will also work with the wider community to facilitate use of appropriate licenses and data use policies.

Which existing projects, if any, does this one replace/complement/subsume?  Biocode Commons complements a range of activities already ongoing in the GSC and would create a ‘home’ for a range of tools/resource development projects. “Biocode” refers to the genetic code underlying biodiversity, which are based on data collected in the field.  The space between standards and software, standards and other standards, and between software and other software, is what we are calling the “commons”. Hence, “Biocode Commons”, specifically addresses the gap present in the biodiversity genomics community between collections based data (commonly museum specimens and tissues) and the sequence based datasets.

How does this project fit into GSC’s mission statement?  The Biocode Commons will work towards implementation of GSC standards and harmonization of these standards and tools with the efforts of related groups, in particular in co-ordination with the Genomic Biodiversity Working Group.

The Biocode Commons would represent a one-stop shop for projects carrying out genomic observations to see what software is already available (free or commercial) and to identify any gaps that they might then fill with a development component of their grant.  This would prevent duplication of effort, target investments more efficiently, and provide a sustainability model – people could pick up software developed under another expired grant and improve/extend it with their grant (similarly useful for commercial companies to identify opportunities).

The Biocode Commons would aim to host 2 hackathons per year, either at GSC workshops or related meetings, whose specific goal would be to encourage tool development that addresses problems between the areas of collections and analysis of data, including adoption of standards.  Recognizing the distinct communities surrounding collections and collections databases and sequencing and sequence databases, the goal for the hackathons are to work on topics that normally would not be solved by these communities alone.  Examples of these topic areas are: Semantic integration, Field data plugins to analytical software, adoption of metadata coverage indexes for relevant domains, and GUID adoption for biological objects (to enable tracking).

Have you spoken about the project already within GSC?  Discussed at GSC12 and 13 but not formally presented. Will be launched at GSC 14 as part of the launch of the GOs Network – Biocode Commons will help organize the GSC hackathons.

Will you start a GSC working group Biocode Commons would be a project under the Genomic Observatories Network under the GSC umbrella.

How do you wish to further engage the GSC? Biocode Commons will work closely with a range of projects already in the GSC, bringing together tools developers.  The GSC meetings are essential to developing Biocode Commons further.  We intend to also interact with, for example, TDWG (Biodiversity Information Standards), iEvoBio, and SPNHC (Society for the Preservation of Natural History Collections) thus expanding the reach, scope and expertise of groups familiar with and working within the wider GSC community.

Do you already have a website or do you wish to create a home page for the project in the GSC website? We already have a website (www.biocodecommons.org) and a blog but would create a link in the GSC website to this external site.

What other resources might you like from what the GSC can offer (mailing lists, etc)? All available – likely a mailing list to start with.

What kind of timeline are you working to for building consensus, releasing a first version etc? We are looking for help/funding to help get this project off the ground.  GSC 14 will be the formal launch of the Biocode Commons under GOs Network.

What resources will be required for completion (funding, manpower, etc.)? (This question is just to give an idea about the size of the project).  We are currently seeking funding.

What are your current plans for publishing/promoting the project?  GSC 14 and working with the GSC to build the community.  See http://genomicobservatories.blogspot.com/p/publications.html

A call for an international network of genomic observatories (GOs)

Neil Davies, Chris Meyer, Jack A Gilbert, Linda Amaral-Zettler, John Deck, Mesude Bicak, Philippe Rocca-Serra, Susanna Assunta-Sansone, Kathy Willis, Dawn Field GigaScience 2012, 1:5 (12 July 2012)

This paper is part of a Genomic Standards Consortium series page that is continuing to take submissions:

http://www.gigasciencejournal.com/series/GSC_and_beyond

References or relevant websites:

Biocode Commons: http://biocodecommons.org/

BiSciCol:  http://biscicol.blogspot.com/

SPNHC: http://www.spnhc.org/

TDWG: http://www.tdwg.org/

iEvoBio: http://ievobio.org/  (Biocode Commons sponsored challenge)