towards a richer set of information to describe our complete genome collection

Proposal.002:Maximum Reporting from Different Sources

From Genomic Standards Consortium

On this page:

[edit] Deadline for contributions to proposal

NOT SET

Please edit this document up to the deadline stated. Further to this date, a vote will be taken on the options stated

[edit] Background

GCDML not only aims to implement the MIGS/MIMS checklist specification, but also aims to allow maximum reporting. That is to include descriptors not necessarily specified by MIGS/MIMS, but used by scientists and databases. This proposal discusses possible solutions how to manage different reports from different sources about the same genomic project.

[edit] Relation to other Proposals

[edit] Proposal

Current simplified MIGS/MIMS compliant report structure:

<nasReports>
   <_Report>
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
</nasReports>


[edit] Approach 1

Each source of reports e.g. CAMERA, Genome Reviews, GOLD etc. indicates the origin with an attribute. Then two reports with the same GCAT_ID, but different sources would be comparable. In other words only the combination of source and GCAT_ID uniquely identifies a report. In addition each report has a version attribute for versioining changes in the content of reports.

<nasReports>
   <_Report source="GOLD" version="1">
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
   <_Report source="GENOME Reviews" version="1.2.3">
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
</nasReports>

[edit] Variation

It could be indicated that the list of reports should refer to the same (meta)genome entity but come from different source:

<nasReports>
  <comparison>
   <_Report source="GOLD" version="1">
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
   <_Report source="GENOME Reviews" version="1">
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       <isolate>
           <extension />
       </isolate>
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
  <comparison>
</nasReports>

[edit] Approach 2

Each descriptor where comparability is wanted can be repeated with different source attributes values.


<nasReports>
   <_Report version="1" >
       <gcatID />
       <studyData>
           <extension />
       </studyData>
       <originalSample>
           <samplingTime />
           <_SampleLocation>
              <extension />
           <_SampleLocation>
           <_Habitat >
               <extension />
           </_Habitat >
           <extension/>
       </originalSample>
       
       <isolate source="staininfo" >
           <extension />
       </isolate>
       <isolate source="DSMZ" >
           <extension />
       </isolate>
       
       <dnaExtract>
           <extension />
       </dnaExtract>
       <dnaLibrary>
           <extension />
       </dnaLibrary>
       <sequencing>
           <extension />
       </sequencing>
       <extension />
   </_Report>
</nasReports>


[edit] Approach 3

Combination of Approach 1 and Approach 2

[edit] Discussion

Loading...