Draft Policy on GCat identifiers
From Genomic Standards Consortium
Scope
A GCAT identifier is unique to a specific genome report.
An identifier will only be issued once for a given genome report and cannot be re-used.
There exists a one-to-one mapping between a GCAT identifier and an NCBI genome project [1] number.
In the case that an identifier is requested for a genome report for a given NCBI genome project, and the genome report is subsequently deleted; a new GCAT identifier will be required for a new genome report for the same NCBI genome project.
Syntax
A GCAT identifier will take the form of a serial number following by _GCAT, e.g. 000001_GCAT (format specifically selected to make GCat identifiers "look" different from an INSDC accession numbers)
An identifier will have GCAT in uppercase, however the identifier is case-insensitive, i.e. 000001_GCAT is the same as 000001_gcat
The leading zeros in the identifier are not essential, e.g. 1_GCAT, 000001_GCAT, and 001_GCAT are the same.
Issue of GCAT identifiers
It will not be possible to request specific GCAT identifiers; identifiers will be issued in sequence starting with 000001_GCAT.
Identifiers are now being issued.
The GCAT identifier will be recorded in the MIGS genome record (e.g. entered by the user).
If a report is deleted, the GSC will still retain information regarding the report and be associated with the gcat identifier for the sake of 'housekeeping.
Genome Report Filenames
Genome reports will optionally have filenames that include the GCAT identifier, and will consist of the identifier followed by a file type suffix, e.g.
000001_GCAT.xml
Revisions
Following the NCBI policy on accession numbers, a GCAT identifier will remain the same even if the content in the genome report changes.
Additional users, including the "GSC" will be able to edit any report but the original contents of the report as submitted by the original provider will always be retreivable and the source of all information in the genome will be transparent to the reader.