ISA-Tab notes
From Genomic Standards Consortium
[edit] Working Notes
Actions completed, information about case study
- completed creation of main wiki pages
- made GEO accessions (public), NCBI Short Read Archive to follow when available - add all links to main page; Jack doing whole submission, so key contact for both submissions (and, for example, the process, and details of metadata they currently require; GEO/Array Express/currently working towards standardizing UHTS transcriptomics etc See entry MGED_workshop)
- GEO has the biogeochem data in a spreadsheet - nice feature that they did take it
- MIGS/MIMS submission to Genome Catalogue underway; awaiting stable MIGS/GCDML
- Phillipe successfully coded the experimental design into ISA-tab on the basis of reading the paper.
- It was very useful to review ISA-Tab and the MIGS checklist. Raised some issues with the checklist that we need to review. Namely, these involving the split between Study and Assay.
[edit] Finalize linking out to all identifiers
This example, shows the problem of cross-linking between GEO and NCBI SRA.
Need to find SRA ID's for metagenomes, Jack to follow up. Not yet available on public site.
NCBI SRA: http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?
Putting MIGS/MIMS into ISA-Tab
It could go in
- biomaterial
- protocol (some as protocol parameters)
- additional structure
lat/long identical for each bag, but could be different - this is a characteristic of a sample then need to document usage
Looking at the GEO data, it is very rich - some biochem parameters taken at time of sampling - could be MIMS compliant. Other is 'data' not 'metadata'.
Which measurements are taken with the sample? pH, temp, confirm Jack
Now, the 16S submission study is also relevant: http://gensc.org/gc_wiki/index.php/Bergen_Experiment_16S
Thinking, about putting 16S into ISA-tab in the future.
Nucleic Acid Sequence is actually referring to "NAS Source"
All data can really be put in a priori.
We need to get the definitions of the checklist linked into the checklist (documentation).
For example, the definitions of elements are in the XML schema
http://gensc.org/gsc/gcat/reports/create
Note from Tanya:
A schema documentation function existed previously in GCat, but with the introduction of substitution groups and choice elements in the GCDML schema, it needed to be updated.
I have just now added a function to the GCat to display documentation contained in XML schema files uploaded to the schema repository in GCat, including GCDML, that is based on the existing schema transformation to input form function. This is a quick implementation, and can be modified further to requirements.
Here are two example links: latest version of GCDML available in GCat http://gensc.org/gsc/gcat/schema-info?file=schema-version5.xsd&version=5&targetnamespace=http://gensc.org/gcdml
earlier version of MIGS/MIMS schema http://gensc.org/gsc/gcat/schema-info?file=schema-version2.xsd&version=2&targetnamespace=
New boundary rules between Study and Assay issued:
These clarify how to re-split MIGS - and we will move these from under "Sequencing" to "Study)
Sequencing
- Isolation and growth conditions
- Biomaterial treatment
- Volume of sample
- Sampling strategy
- need for formalize and move into the right place in the wiki MIGS Change Log
[edit] Putting MIGS/MIMS into ISA-Tab
Some key details of ISA-Tab of relevance
- Characteristics [Biosource] = much of MIGS could go here
- Learning ISA-Tab design:
- Any descriptor with a bracket is variable (decide what to put in)
- takes protocols (ref)/standard operating procedure (declare the protocol in the reference section/add parameters - use if you want to provide more information
- Takes contact info: MIGS doesn't yet capture contact details because it is an extension of what trace/INSDC collect, it is collected at the implementation level by the Genome Catalogue - we need to review this situation
- Assay tab - nothing for high throughput sequencing, we clone the one for transciptomics and make a specific assay, flat structure, a sample, one library, get a set of files - discussed whether we need separate one - genome sequencing, metagenomics, metatranscriptomes, - but we will try to make it one
- Investigation, to trace archive - could be expanded - short read for pyro, traditional trace archive, could add GEO for
mRNA - need to add nucleic acid type?
- experimental design - asked if pooling was important - yes, see Rohwer data, we added polygon to the geographic description and a descriptor to capture pooled - pooled is linked the the ongoing problem of one id per project, which could have multiple sequences, and now we have experimental designs to deal with - could we capture this in investigation? What do we do?
Other thoughts
- ISA-Tab could be used as a generic data capture format for any type of experimental data - review in the NEBC context
- try for a 16S submission?
- else?