Skip to main content.

Introduction

The Semantic Web faciliates discovery of resources in distributed environments. It is based on RDF and OWL, which are are based on URIs. This topic discusses what is the best way to provide semantics ( assign URIs ) to OPeNDAP resources.

-- LuisBermudez - 19 Jan 2007

Here are some options for enabling semantics in OPeNDAP that I mentioned in my talk:

OPeNDAP and semantics

DAP4

Response type approach (slide 4)

AIS (slides 16-18)

OLFS (slide 19)

BES (slide 20)

* Respond to RDF/OWL and add commands and semantic interface, i.e. to read in an ontology, use an inference engine, etc.

Others?

-- PeterFox - 22 Feb 2007

email conversation on Semantic Embedding in DAP4

I believe that semantics in OpenDAP are important not only for discovery but for addressing some of the data interoperability problems that currently exists between the multiple OpenDAP APIs.

A summary of the e-mail conversation so far:

I think the key to incorporating semantics into OpenDAP is two small changes to the DAP4 data model. Not only is this important for carrying discovery semantics, but I think it is a golden opportunity to address some of the mismatch that occurs between the different OpenDAP APIs, mismatches which are due to different use semantics.

The changes come down to

1) adding namespaces, and 2) adding objecttype properties.

In one sense, 1) is possible now if :/# are legal characters in attributes. But I want it to work in a deeper sense.

I think the correct mapping is that the current attributes container becomes the local name space container, i.e. the full name for an attribute "joe" defined in a normal OpenDAP dataset with no convention is the opendap_url.ddx#joe. Then one could also have additional containers that correspond to specified names spaces, e,g, http://iridl.ldeo.columbia.edu/ontologies/cf-att.owl# abbreviated as cfatt: would contain the cf attributes, etc. And most importantly, the information that the opendap code uses would go in the dap4: namespace. This is based in part on my experience with ActiveRDF in Ruby, which maps RDF properties to Ruby methods, and quickly discovered that a namespace mechanism (which already existed in Ruby) quickly gives you usable explicit methods, e.g. dataobject.dap4::shape is easy enough, and making dataobject.shape work is just not worth the convenience.

So then I think 1) has two parts: an abbreviation mechanism so that a long prefix like http://opendap.org/ontologies/dap4.owl#Grid can be manipulated as as dap4:Grid, and additional containers for attributes that belong to conventions.

2) is essentially a slight rethinking of the alias that James is tempted to take out of DAP4: instead of alias being an alternate variable, attributes can point to variables. This is to have attributes that can point to objects in the dataset (or to URIs to point to objects outside of the dataset). I think this is quite doable, and could fix two sets of issues: the "same name" semantics of netcdf vs the "container" semantics of Java (and repeated transmission of common independent variables) could be replaced by an explicit object property that connects opendap arrays and connotes that relationship, and generalizing the GRID/Map structure to something that handles curvilinear coordinates (the MAP structure is replaced by a hasMapVector dap4: object attribute that connects arrays, or connects a grid object to its map vectors).

This focuses on the DAP4 data model on purpose -- I think if the DAP4 data model is clearly defined, one can then talk about transmission issues (what happens in DAP2, what are the possible requests in DAP4, what does it look like in JSON, can it be transmitted in RDF/OWL?). Part of what happens in this restructuring is that one could easily load all the opendap data objects into an RDF database and perform operations across all the data objects using any of the available semantics, regardless of server.

James asked

Would this be as simple as having a URI attribute type (just like we have Int32 and String). We actually already do have a URI attribute type, although in my naivete I called it 'URL.'

Or would the URI be a property of the attribute? That is, there could be an attribute called 'scale_factor' and it would have a value and and URI which references some object in the DDX. Would this object be limited to Variables, or would the object be either Variables or Attributes?

I think the answer is

My point 2) is the former (i.e. the existing URL type) -- the URI type is used to point to outside objects. I also want to point to inside objects, and any OpenDAP API which has data objects would have the method aka 'attribute' automatically dereference the URI and return the local object. I suppose the actual transmission mechanism could use URIs -- that is what RDF/XML does, though it has an abbreviation mechanism/default namespace for local objects. But in the data model, we are pointing to the object, not the URI of the object, and any OpenDAP API method should also return the object.

My point 1) is essentially the latter: your point is an unusual characterization of namespaces. In RDF or XML, the true attribute name is the URI, and then we define a convenient prefix for the long nasty part of the URI. So there is one long name for the attribute, which is usually split into a namespace and a id within that namespace. In Ruby Objects, it turns out that the best practice seems to be to map the namespace to a method and the id to a submethod, so we end up with dap4::shape instead of shape. So I would say it is not the URI which is a property of the attribute, but rather the namespace prefix (most of the URI) which becomes a property of the attribute. Alternatively, attributes are URIs, with the current set of unspecified attributes belonging to the local name space of the dataset.

Since your data model already puts attributes into a container, I really liked the idea of having a local name space container sameAs the attributes container, plus additional containers for other namespaces. I think this is the semantically-correct mapping.

Your point makes it pretty clear how to carry semantics in DAP2, if the funky attribute names don't break anything. Could we change DAP2 to allow those characters if necessary?

Luis asked

in 1 do you mean that all the attributes in DAP4 should be identified with a URI ? and that a mechanism to prefix namespaces should be available ?

I think the answer is

Yes, all attributes in DAP4 should have a URI, with the traditional unprefixed attribute being placed in the namespace of the dataset URI.

The namespace mechanism is mainly to get the full URI within the legal syntax of attributes or methods in whatever language one is in, e.g.

http://opendap.org/ontologies/dap4.owl#shape

is probably illegal as an OpenDAP attribute name, or a netcdf name, or a Ruby method, etc,

but

dap4::shape

is a legal ruby method (probably still illegal as netcdf or OpenDAP, I cannot remember). Could change the rules, or come up with something within the rules....

This, of course, makes it possible to carry OpenDAP in RDF, which would mean one could aggregate datasets in an RDF datastore, getting back data aggregations from RDF queries, etc. Or carry OpenDAP in any other object transport mechanism.

-- BennoBlumenthal - 28 Feb 2007

Semantic Embedding in DAP2

Following up on James' points above, we can embed some semantics in DAP2.

1) Extend usage of the Conventions attribute to allow namespace specification, i.e. use strings of the form ns=URI to specify a namespace e.g. cfatt=http://iridl.ldeo.columbia.edu/ontologies/cf-att.owl#. This is completely consistent with the semantic usage of Conventions.

2) Then use namespace abbreviations in attributes, e.g. cfatt:standard_name. This too is within the DAP2 spec, though it may be outside the netcdf spec (i.e. I think ':' is not a legal character for the netcdf API as implemented by UCAR for files).

3) Use URL attribute type for URIs. We can also define a convention that allows DAP2 variable names for refering to other entities within the current dataset, since they cannot be misconstrued as global URIs.

4) We should additionally specify what the proper URI for each entity in a OpenDAP dataset should be, i.e. dodsurl#variable_name, so that OpenDAP variables can be consistently refered to by RDF/OWL and/or XML files.

I have an additional agenda in encoding DAP4 structures as attributes in the DAP4 namespace, but that would not be part of this DAP2 encoding.

-- BennoBlumenthal - 02 Mar 2007

Benefits

So why make the above enhancement to DAP2?

Semantic Transport

Precisely stating which attributes belong to which convention makes understanding the entire set of attributes much easier when multiple conventions are used to add semantics to OpenDAP variables. It also avoids the problem of local name collisions, where two different conventions use the same local name for different (hopefully slightly different) things.

Explicitly recognizing the the URL type should be used to refer to standard concepts, while not literally a change to OpenDAP, provides a way to transport semantics like what Luis uses for examples in his talk, e.g.

<sos:observedProperty xlink:href="http://marinemetadata.org/cf#sea_water_temperature"/>
<sos:units href="urn:ogc:unit:degree" />
<sos:featureOfInterest xlink:href="urn:mmi.feature#bodyOfWater" />

can be transported (adding some more attributes of my own)

    sst {
        URL sos:observedProperty "http://marinemetadata.org/cf#sea_surface_temperature";
        URL sos:units "urn:ogc:unit:degree";
        URL sos:featureOfInterest "urn:mmi.feature#bodyOfWater";
        String cfatt:units "degree_Celsius";
        String cfatt:long_name "sea surface temperature";
        String cfatt:standard_name "sea_surface_temperature";
        URL term:isDescribedBy "http://iridl.ldeo.columbia.edu/ontologies/iridl.owl#NOAA",
                   "http://sweet.jpl.nasa.gov/ontology/data_center.owl#National_Oceanic_and_Atmospheric_Administration",
                               "http://marinemetadata.org/2005/08/gcmd-keyw#Oceans";
        URL iridl:hasDocumentation "http://iridl.ldeo.columbia.edu/ontologies/ReferenceList.owl#Reynolds_Smith1994";
        }
NC_GLOBAL {
    String Conventions "sos=http://www.opengis.net/sos/0", 
                       "cfatt=http://iridl.ldeo.columbia.edu/ontologies/cf-att.owl#", 
                       "iridl=http://iridl.ldeo.columbia.edu/ontologies/iridl.owl#", 
                       "term=http://iridl.ldeo.columbia.edu/ontologies/iriterms.owl#";
}


Other terms mention in "isDescribedBy" can be inferred from an ontology repository, where narrower, same as, and broader terms can be inferred. -- LuisBermudez - 07 Mar 2007
In this particular case, all the terms mentioned in "isDescribedBy" were inferred, either by inheritance from parent datasets, equivalence between alternative ontologies, or implications from the standard name "sea_surface_temperature". -- BennoBlumenthal - 12 Mar 2007

URIs for OpenDAP objects allow use of Semantic Web Technologies

Establishing URI's for the elements within a OpenDAP dataset means that outside documents/software can refer to those elements and make semantic statements about them. This allows use semantic translation, where an outside program (AIS Server) can translate the use semantics of a variable into different conventions, e.g. CF or OpenGIS or GRIB or any of the many lesser-known systems for adding some metadata to data.

Starting point for more powerful and explicit DAP use semantics

Using the URL type to code references to local elements allows DAP to be more explicit and flexible about the relationships between the elements of a DAP datasets. Namespaces in particular allow a dap4 namespace where attributes understood by the OpenDAP Core can be placed. Prime candidates would be a hasMapVector attribute which could connect an OpenDAP array to its Map vectors, allowing what can currently be expressed as a OpenDAP Grid, but also allowing curvilinear coordinates, and allowing reuse of common Map vectors among many dependent variables. We could also add a covariesWith attribute, which would explicitly state the connection between columns of a table or netcdf variables with common dimensions and make it possible to better translate between the current netcdf, Java, and db APIs, as well as to/from OpenGIS (WCS/WFS) standards, as well as enabling future software clients that would be able to traverse the relationships between networks of variables.

This is all achieved without damaging backwards compatibility -- older clients simple have the same understanding of variable relationships that they currently have.

-- BennoBlumenthal - 05 Mar 2007