GGBN

1. Project Title “Global Genome Biodiversity Network (GGBN) Data Standard specification”

2. Project Leads

– Gabi Droege, g.droege@bgbm.org, Botanic Garden and Botanical Museum Berlin-Dahlem, Freie Universität Berlin, Berlin, Germany
– Katie Barker, barkerK@si.edu, National Museum of Natural History, Washington DC, United States
– Jonathan Coddington, coddington@si.edu, National Museum of Natural History, Washington DC, United States
– Ole Seberg, oles@snm.ku.dk, Natural History Museum Denmark, Copenhagen, Denmark

3. Team members Authorships of GGBN Data Standard publications

4. Elevator pitch (20 – 50 words) The GGBN Data Standard serves to exchange and share information (data) related to the creation, maintenance and legal provisions connected to physical DNA, RNA and/or tissue samples in biodiversity repositories as well as molecular sequences. It complements other existing biodiversity data standards such as MIxS, Darwin Core and ABCD (Access to Biological Collection Data).

5. Project Summary (two or three paragraphs of background, purpose and plans) The Global Genome Biodiversity Network (GGBN) was formed in 2011 with the principal aim of making high-quality well-documented and vouchered collections that store DNA or tissue samples of biodiversity, discoverable for research through a networked community of biodiversity repositories. This is achieved through the GGBN Data Portal (http://data.ggbn.org), which links globally distributed databases and bridges the gap between biodiversity repositories, sequence databases and research results. GGBN is open to any biodiversity biobank and aims at bringing together different communities and exchange their experiences towards developing a standard for data associated to physical samples.

The GGBN Data Standard (http://terms.tdwg.org/wiki/GGBN_Data_Standard) is a set of terms and controlled vocabularies designed to represent tissue, DNA or RNA facts and does not cover e.g. scientific name, geography or physiological facts. Within GGBN it is used together with Darwin Core or ABCD. It covers all molecular terms of MIxS, MIMARKS and MIGS and can also handle SPREC (Standard PREanalytical Codes) and large parts of BRISQ (Biospecimen Reporting for Improved Study Quality).

The alignment between GGBN terms and MIxS terms, as well as Darwin Core and MIxS has been done already. A full alignment between MIxS and ABCD is planned for the near future. A first stable version of the GGBN Data Standard is planned for December 2015. In addition it will be submitted to the TDWG (Taxonomic Database Working Group) committee for ratification as an official standard within the community of natural history collections.

6. What will this project aim to contribute to the GSC? The GGBN Data Standard will enable biodiversity (non-human) biorepositories, biobanks, cryorepositories, cryobanks, DNA Banks and similar institutions to exchange information on the physical source of genomic data, thus extending and enriching the value and provenance of genomic data.

7. Have you spoken about the project already within GSC? (on a call, at a formal GSC meeting, would like to request time to present at a future meeting). Yes, as part of collaboration within the German Federation for Biological Data project (GFBio). A presentation about the GGBN Data Standard at a future GSC meeting would be appreciated. Presentations about GGBN itself have been done in the past. In addition GGBN participated in GBIF (Global Biodiversity Information Facility)/GSC hackathons.

8. Which existing projects, if any, does this one replace/complement/subsume/expand? Explain briefly why an extra project is needed/justified (what gap does it fill?) No standardized, universal or official vocabulary describing physical biodiversity genomic samples (usually genomic DNA, RNA, tissues or other organismal parts) exists. In order to make physical genomic resources discoverable to the user community and machine reasonable, a community-generated and sanctioned data standard is required.

9. How does this project fit into GSC’s mission statement (might also expand it)? The GGBN Data Standard serves to exchange and share information (data) related to the creation, maintenance and legal provisions connected to physical DNA, RNA and/or tissue samples in biodiversity repositories as well as molecular sequences. It complements and extends the GSC mission: the implementation of new genomic standards, methods of capturing and exchanging metadata, and harmonization of metadata collection and analysis efforts across the wider genomics community.

10. Will you start a GSC working group (how far along are you?)? If not, why not (i.e. subgroup within developers group, existing external community, etc) No. There exists already a working group within GGBN.

11. How do you wish to further engage the GSC (recruit members to project, get consultation, link to other GSC projects, etc)? Attend GSC conferences, invite GSC members to GGBN conferences, joint collaborations, e.g. Ocean Sampling Day, German Federation for Biological Data.

12. Do you already have a website or do you wish to create a home page for the project in the GSC website (GSC maintains an open wiki at present, all working groups have a page)? The GGBN Data Standard is described under http://terms.tdwg.org/wiki/GGBN_Data_Standard

13. What other resources might you like from what the GSC can offer (mailing lists, etc)? Interaction with GSC members and their institutions—specifically collaboration to implement the GGBN standard data and provision of biobank, biorepository data to the GGBN data portal, thus making global genomic resources discoverable to stakeholders.

14. What kind of timeline are you working to for building consensus, releasing a first version etc? A stable version of the GGBN Data Standard will be available in December 2015.

15. How is this work currently funded (list grants, funders, in kind contributions, etc)? DFG (German Research Foundation) project, in kind contributions of GGBN partners.

16. What resources will be required for completion (funding, manpower, etc.)? (This question is just to give an idea about the size of the project) Will be completed within running DFG project.

17. What are your current plans for publishing/promoting the project? White Paper is published in the Database Journal.

18. References or relevant websites (for further reading) see 13.