towards a richer set of information to describe our complete genome collection

Genomic Rosetta Stone and LinkOut

From Genomic Standards Consortium

The following page includes email discussions with the LinkOut team with respect to the identifier mapping:

On this page:

[edit] 19 March 2008 - LinkOut

We should be able to handle this 32K links. Please go ahead to send us sample LinkOut files. Please use <UrlName> for the ID, and <Rule> for the URL for the resource. Details of LinkOut files can be found at:

http://www.ncbi.nlm.nih.gov/projects/linkout/doc/nonbiblinkout.html

Please let us know if you have any questions in preparing the file.

Kathy


[edit] 18 March 2008 - GSC

A rough estimate on the number of IDs is : 32,400

  • there are 9 databases at the moment (this may increase)
  • for each database, there will be one internal identifier for each genome project identifier
  • number of IDs = number of databases x number of genome project identifiers:

9 x 3,600 = 32,400

It is proposed that each database provider will register with LinkOut and supply/maintain the links.

(correction - 10 databases involved, so number would be approx 36,000)

[edit] 18 March 2008 - LinkOut

Could you estimate the number of IDs that will be supplied to LinkOut?

In our thinking, each ID from a database will form a LinkOut link, with URL in the <Rule> element and the ID in the <UrlName>. Will you provide those links on behalf of each database, or each database will contact us to supply the link?


[edit] 17 March 2008 - GSC

Each of the databases involved in the identifier mapping will have a public web site, where an identifier can be resolved to a resource record via a URL. It is predicted, though, that the genomic identifier mapping will be queried primarily via Web services using a SOAP or REST API. To meet our needs, the identifier could be parsed from a resolution URL, but this is a messy compromise, and should not really be considered. As such, we think it is necessary for the identifier to be recorded separately.

Further to your suggestion, the URLName could meet the requirements of the identifier mapping, by allowing the database providers to record an identifier. I think the only alternative would be for the DTD to be amended to include a new element to record an identifier for the resource.

The current list of genomic database providers that will be included in the identifier mapping is as follows:

  * Genome Catalogue
  * Genomes Online Database GOLD
  * Straininfo.net
  * RDP
  * Genome Reviews
  * SEED
  * Genome Atlas
  * SILVA
  * IMG
  * CMR 

[edit] 17 March 2008 - LinkOut

Thank you for your interest in LinkOut. Please send email to linkout@ncbi.nlm.nih.gov (Not lib-linkout@ncbi) in the future.

We need to discuss your proposal internally. I would like to clarify a couple of things at this point:

1. LinkOut basically records URL and points users to it. Will the identifiers that you talked about be an URL, or will it be easily turned into an URL, and point users to a related resource? 2. what are the genomic database providers that you are thinking to include?

To answer your question, a LinkOut "Attribute" should come from our pre-defined list: http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinkout.section.files.Special_Elements_Att

However, provider can use "UrLName" element to send a short string to describe each link:

http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=helplinkout.section.files.Special_Elements_Url


[edit] 12 March 2008 - GSC

The GSC are working to make available a mapping of identifiers from genomic databases including NCBI genome project, GOLD, Genome Catalogue, etc

The proposed implementation makes use of the LinkOut service, as follows:

  • genomic database providers register with LinkOut as an external resource provider
  • genomic identifiers are recorded in the LinkOut external resource record, associated with an NCBI genome project identifier
  • we download the LinkOut records via the eLink web service
  • we host a local version of the genomic identifier mapping and provide a genomic identifier resolver web service

To identify if this proposal is feasible, I would like to ask:

    • In the existing LinkOut DTD, an attribute element is defined. Can the attribute element be used by an external resource provider to record arbitrary data, for instance a database identifier?
    • If not, is it possible to modify the DTD to meet the requirements of our proposed use of the LinkOut service?
Loading...