Proposal.004:SAWSDL for Semantic Integration
From Genomic Standards Consortium
On this page: |
[edit] Deadline for contributions to proposal
NOT SET
Please edit this document up to the deadline stated. Further to this date, a vote will be taken on the options stated
[edit] Background
The GSC is strongly committed to the use and development of ontologies. For example the GSC is a member community in the Ontology for Biomedical Investigations (Whetzel et al, 2006), a founding member of the is Environment Ontology project (http://environmentontology.org), and is driving the development of a minimum controlled vocabulary of habitat terms. The GSC has agreed that, just as for MIGS/MIMS, there is a need to provide semantic transparency for GCDML through the integration of ontologies.
An outstanding issue is how to do it best. The following consensus emerged from discussions within the GSC:
- finding a solution that does not add complexity to NAS reports,
- does not depend on the availability of ontologies,
- and does not force users of GCDML, and/or NAS reports, to become acquainted with ontologies.
[edit] Relation to other Proposals
[edit] Proposal
It is porposed to use SAWSDL, which stands for "Semantic Annotation for WSDL and XML Schema". It is a recommendation of the World Wide Web Consortium (W3C; August 2007) that allows annotation of XML Schema with references to ontological concepts (independent of the type and format of the ontology). SAWSDL allows separation of the syntactic modeling of data and semantic modeling of a knowledge domain, by first focusing on the use of XML Schema to model data. Next, SAWSDL is used to state within the GCDML schema which XML Schema construct has a meaningful relationship to which ontological entity. Thus, enumeration of categorical terms can be used to ensure the syntactic consistency of MIGS/MIMS compliant reports, while allowing semantic applications to utilize NAS reports with ontological concepts.
[edit] SAWSDL Example
<simpleType name="geographicFeature">
<restriction base="string">
<enumeration
value="soil"
modelReference=”http://purl.org/obo/owl/ENVO#ENVO_00001998” />
<enumeration value="water" modelReference=”http://purl.org/obo/owl/ENVO#ENVO_00002006” />
</restriction>
</simpleType>
This example shows the XML Schema snippet with SAWSDL annotation (namely the "modelReference" attribute), which still validates the same XML documents as the non-annotated XML Schema. SAWSDL can be introduced any time, because it would not affect NAS reports in any way.
There already several tools available to use SAWSDL annotated XML Schema documents see Semantic Annotations for WSDL Working Group.
[edit] Further information
- Semantic Annotations for WSDL Working Group
- Report of SAWSDL implementations
- email exchange with SAWSDL Working Group
[edit] Discussion
SAWSDL is agnostic to semantic representation languages.
Excerpt from an email discussion written by User:Renzo
a) All my thinking from an XML schema perspective. Already from the first time I saw MIGS.xsd I completely fall in love in the idea of enumerations of terms to control input. The power of this concept is to avoid using either "string" and or "uri" types, because both of these types have the disadvantage that they allow basically any input. In case of an URI the W3C XML Schema does not force XML Schema parser to validate the correctness of URIs and even if they would do so there is no way to check the existence of an URI (making it an URL).
b) Assuming enumerations are the way of choice, the question is 1. how to link to ontologies and 2. on which level to link: on the XML Schema level and or the document level.
c) There several solutions discussed in this GSC-wiki: Using_RDFa_in_GCat_-_a_demonstration and Ontology_integration_into_the_Genome_Catalogue. So far (hope I got them all) the following solutions are discussed: RDF, RDFa, OLS, Swoogle, Google, link in <appinfo>, own defined XML element, Fuge.
d) Here are my concerns with all these approaches: OLS, Swoogle, Google are simply no solutions to linking, because they are for finding terms and links.
RDFa is simply only for XHTML and therefore not usable for XML Schema and instances.
RDF, own defined XML element, Fuge would all work with URI's on the XML instance level. For my concerns about URI see a). The problem on the document level is, that if a URI changes all instances have to be updated which puts more burden on XML instance versioning.
e) So there was and is already agreement to put the linking (URI) on the schema level, which would allow to use enumerations (so any XML Schema can validate if a correct term was chosen) and any semantic application can clearly find the correct link (URI) in the XML Schema based on a term in an XML instance. So from the previous discussions the use of <appinfo> is left. The proposal of SAWSDL provides the same features as the <appinfo> solution, but is first of all a W3C recommendation and second the semantic tools for clearly getting the links between terms and URI are already developed (see my previous mail and the last discussed GCDML manuscript).
f) The SAWSDL solution does not prevent GenCat (and other applications) to utilize OLS and other Ontology term search engines; and it does not prevent the use of RDFa on the rendering site of GenCat.
g) The enumerations approach does have the valuable advantage of being able to use terms that are not yet defined in an ontology and just link to it when available. And also very important the linking on the schema level just has to be done once and is valid for all instances, whereas linking on the document level has to be done for each instance and by each application generating GCDML.