Introduction
The Semantic Web faciliates discovery of resources in distributed environments. It is based on RDF and OWL, which are are based on URIs. This topic discusses what is the best way to provide semantics ( assign URIs ) to OPeNDAP resources.
--
LuisBermudez
- 19 Jan 2007
Here are some options for enabling semantics in OPeNDAP that I mentioned in my talk:
OPeNDAP and semantics
DAP4
- DAP2 - DAS now in DDX
- DAP4 - adds to DAP2 with additional datatypes
- XML responses, i.e. allowsWeb services interface
- add RDF or OWL along side the DDX?
- Also need - Datatype Ontology (see Benno's diagram) - is also of interest to ESIP
Response type approach (slide 4)
- e.g. like the way .info is used for some general infor for a dataset but it is handler specific
- see the OLFS section, i.e. .rdf or .owl as a response type (and a request type)
AIS (slides 16-18)
- AIS allows the merging of DAS streams
- Could we use AIS to serve the ontology and/or mappings to the attributes
OLFS (slide 19)
- RDF/OWL request/response type and use OWL documents to talk to the BES
BES (slide 20)
* Respond to RDF/OWL and add commands and semantic interface, i.e. to read in an ontology, use an
inference engine, etc.
Others?
--
PeterFox - 22 Feb 2007
email conversation on Semantic Embedding in DAP4
I believe that semantics in
OpenDAP
are important not only for discovery but for addressing some of the data interoperability problems that currently exists between the multiple
OpenDAP
APIs.
A summary of the e-mail conversation so far:
I think the key to incorporating semantics into
OpenDAP
is two small changes to the DAP4 data model. Not only is this important for carrying discovery semantics, but I think it is a golden opportunity to address some of the mismatch that occurs between the different
OpenDAP
APIs, mismatches which are due to different use semantics.
The changes come down to
1) adding namespaces, and
2) adding objecttype properties.
In one sense, 1) is possible now if :/# are legal characters in attributes. But I want it to work in a deeper sense.
I think the correct mapping is that the current attributes container becomes the local name space container, i.e. the full name for an attribute "joe" defined in a normal
OpenDAP
dataset with no convention is the opendap_url.ddx#joe. Then one could also have additional containers that correspond to specified names spaces, e,g,
http://iridl.ldeo.columbia.edu/ontologies/cf-att.owl# abbreviated as cfatt: would contain the cf attributes, etc. And most importantly, the information that the opendap code uses would go in the dap4: namespace. This is based in part on my experience with
ActiveRDF
in Ruby, which maps RDF properties to Ruby methods, and quickly discovered that a namespace mechanism (which already existed in Ruby) quickly gives you usable explicit methods, e.g. dataobject.dap4::shape is easy enough, and making dataobject.shape work is just not worth the convenience.
So then I think 1) has two parts: an abbreviation mechanism so that a long prefix like
http://opendap.org/ontologies/dap4.owl#Grid can be manipulated as as dap4:Grid, and additional containers for attributes that belong to conventions.
2) is essentially a slight rethinking of the alias that James is tempted to take out of DAP4: instead of alias being an alternate variable, attributes can point to variables. This is to have attributes that can point to objects in the dataset (or to URIs to point to objects outside of the dataset). I think this is quite doable, and could fix two sets of issues: the "same name" semantics of netcdf vs the "container" semantics of Java (and repeated transmission of common independent variables) could be replaced by an explicit object property that connects opendap arrays and connotes that relationship, and generalizing the GRID/Map structure to something that handles curvilinear coordinates (the MAP structure is replaced by a hasMapVector dap4: object attribute that connects arrays, or connects a grid object to its map vectors).
This focuses on the DAP4 data model on purpose -- I think if the DAP4 data model is clearly defined, one can then talk about transmission issues (what happens in DAP2, what are the possible requests in DAP4, what does it look like in JSON, can it be transmitted in RDF/OWL?). Part of what happens in this restructuring is that one could easily load all the opendap data objects into an RDF database and perform operations across all the data objects using any of the available semantics, regardless of server.
James asked
Would this be as simple as having a URI attribute type (just like we have Int32 and String). We actually already do have a URI attribute type, although in my naivete I called it 'URL.'
Or would the URI be a property of the attribute? That is, there could be an attribute called 'scale_factor' and it would have a value
and and URI which references some object in the DDX. Would this object be limited to Variables, or would the object be either Variables or Attributes?
I think the answer is
My point 2) is the former (i.e. the existing URL type) -- the URI type is used to point to outside objects. I also want to point to inside objects, and any
OpenDAP
API which has data objects would have the method aka 'attribute' automatically dereference the URI and return the local object. I suppose the actual transmission mechanism could use URIs -- that is what RDF/XML does, though it has an abbreviation mechanism/default namespace for local objects. But in the data model, we are pointing to the object, not the URI of the object, and any
OpenDAP
API method should also return the object.
My point 1) is essentially the latter: your point is an unusual characterization of namespaces. In RDF or XML, the true attribute name is the URI, and then we define a convenient prefix for the long nasty part of the URI. So there is one long name for the attribute, which is usually split into a namespace and a id within that namespace. In Ruby Objects, it turns out that the best practice seems to be to map the namespace to a method and the id to a submethod, so we end up with dap4::shape instead of shape. So I would say it is not the URI which is a property of the attribute, but rather the namespace prefix (most of the URI) which becomes a property of the attribute. Alternatively, attributes are URIs, with the current set of unspecified attributes belonging to the local name space of the dataset.
Since your data model already puts attributes into a container, I really liked the idea of having a local name space container sameAs the attributes container, plus additional containers for other namespaces. I think this is the semantically-correct mapping.
Your point makes it pretty clear how to carry semantics in DAP2, if the funky attribute names don't break anything. Could we change DAP2 to allow those characters if necessary?
Luis asked
in 1 do you mean that all the attributes in DAP4 should be identified with a URI ?
and that a mechanism to prefix namespaces should be available ?
I think the answer is
Yes, all attributes in DAP4 should have a URI, with the traditional unprefixed attribute being placed in the namespace of the dataset URI.
The namespace mechanism is mainly to get the full URI within the legal syntax of attributes or methods in whatever language one is in, e.g.
http://opendap.org/ontologies/dap4.owl#shape
is probably illegal as an
OpenDAP
attribute name, or a netcdf name, or a Ruby method, etc,
but
dap4::shape
is a legal ruby method (probably still illegal as netcdf or
OpenDAP
, I cannot remember). Could change the rules, or come up with something within the rules....
This, of course, makes it possible to carry
OpenDAP
in RDF, which would mean one could aggregate datasets in an RDF datastore, getting back data aggregations from RDF queries, etc. Or carry
OpenDAP
in any other object transport mechanism.
--
BennoBlumenthal
- 28 Feb 2007
Semantic Embedding in DAP2
Following up on James' points above, we can embed some semantics in DAP2.
1) Extend usage of the Conventions attribute to allow namespace specification, i.e. use strings of the form ns=URI to specify a namespace e.g.
cfatt=http://iridl.ldeo.columbia.edu/ontologies/cf-att.owl#. This is completely consistent with the semantic usage of Conventions.
2) Then use namespace abbreviations in attributes, e.g.
cfatt:standard_name. This too is within the DAP2 spec, though it may be outside the netcdf spec (i.e. I think ':' is not a legal character for the netcdf API as implemented by UCAR for files).
3) Use URL attribute type for URIs. We can also define a convention that allows DAP2 variable names for refering to other entities within the current dataset, since they cannot be misconstrued as global URIs.
4) We should additionally specify what the proper URI for each entity in a
OpenDAP
dataset should be, i.e. dodsurl#variable_name, so that
OpenDAP
variables can be consistently refered to by RDF/OWL and/or XML files.
I have an additional agenda in encoding DAP4 structures as attributes in the DAP4 namespace, but that would not be part of this DAP2 encoding.
--
BennoBlumenthal
- 02 Mar 2007
Benefits
So why make the above enhancement to DAP2?
Semantic Transport
Precisely stating which attributes belong to which convention makes understanding the entire set of attributes much easier when multiple conventions are used to add semantics to
OpenDAP
variables. It also avoids the problem of local name collisions, where two different conventions use the same local name for different (hopefully slightly different) things.
Explicitly recognizing the the URL type should be used to refer to standard concepts, while not literally a change to
OpenDAP
, provides a way to transport semantics like what Luis uses for examples in his talk, e.g.
<sos:observedProperty xlink:href="http://marinemetadata.org/cf#sea_water_temperature"/>
<sos:units href="urn:ogc:unit:degree" />
<sos:featureOfInterest xlink:href="urn:mmi.feature#bodyOfWater" />
can be transported (adding some more attributes of my own)
sst {
URL sos:observedProperty "http://marinemetadata.org/cf#sea_surface_temperature";
URL sos:units "urn:ogc:unit:degree";
URL sos:featureOfInterest "urn:mmi.feature#bodyOfWater";
String cfatt:units "degree_Celsius";
String cfatt:long_name "sea surface temperature";
String cfatt:standard_name "sea_surface_temperature";
URL term:isDescribedBy "http://iridl.ldeo.columbia.edu/ontologies/iridl.owl#NOAA",
"http://sweet.jpl.nasa.gov/ontology/data_center.owl#National_Oceanic_and_Atmospheric_Administration",
"http://marinemetadata.org/2005/08/gcmd-keyw#Oceans";
URL iridl:hasDocumentation "http://iridl.ldeo.columbia.edu/ontologies/ReferenceList.owl#Reynolds_Smith1994";
}
NC_GLOBAL {
String Conventions "sos=http://www.opengis.net/sos/0",
"cfatt=http://iridl.ldeo.columbia.edu/ontologies/cf-att.owl#",
"iridl=http://iridl.ldeo.columbia.edu/ontologies/iridl.owl#",
"term=http://iridl.ldeo.columbia.edu/ontologies/iriterms.owl#";
}
Other terms mention in "isDescribedBy" can be inferred from an ontology repository, where narrower, same as, and broader terms can be inferred.
--
LuisBermudez
- 07 Mar 2007
In this particular case, all the terms mentioned in "isDescribedBy" were inferred, either by inheritance from parent datasets, equivalence between alternative ontologies, or implications from the standard name "sea_surface_temperature".
--
BennoBlumenthal
- 12 Mar 2007
URIs for OpenDAP
objects allow use of Semantic Web Technologies
Establishing URI's for the elements within a
OpenDAP
dataset means that outside documents/software can refer to those elements and make semantic statements about them. This allows use semantic translation, where an outside program (AIS Server) can translate the use semantics of a variable into different conventions, e.g. CF or
OpenGIS
or GRIB or any of the many lesser-known systems for adding some metadata to data.
Starting point for more powerful and explicit DAP use semantics
Using the URL type to code references to local elements allows DAP to be more explicit and flexible about the relationships between the elements of a DAP datasets. Namespaces in particular allow a dap4 namespace where attributes understood by the
OpenDAP
Core can be placed. Prime candidates would be a hasMapVector attribute which could connect an
OpenDAP
array to its Map vectors, allowing what can currently be expressed as a
OpenDAP
Grid, but also allowing curvilinear coordinates, and allowing reuse of common Map vectors among many dependent variables. We could also add a covariesWith attribute, which would explicitly state the connection between columns of a table or netcdf variables with common dimensions and make it possible to better translate between the current netcdf, Java, and db APIs, as well as to/from
OpenGIS
(WCS/WFS) standards, as well as enabling future software clients that would be able to traverse the relationships between networks of variables.
This is all achieved without damaging backwards compatibility -- older clients simple have the same understanding of variable relationships that they currently have.
--
BennoBlumenthal
- 05 Mar 2007