<<O>>  Difference Topic AISIdeas (r1.9 - 02 May 2006 - JamesGallagher)

META TOPICPARENT WebHome

The AIS

Line: 21 to 21

What about devising a URL syntax so that each item can be addressed as part of the URL. That is, instead of a projection CE, add the variable names to the URL like 'http://test/dap/data/nc/fnoc1.nc/u'. This would provide a way for RDF to attach metadata to each variable since each would appear as a thing in the web. That said, the existing syntax might be good enough because "... by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web."RDF Primer. This doesn't mean that each variable has to be accessible, just name-able.

Added:
>
>
Update: At some point RDF & OWL, or something like them, may be important in handling metadata. Whether they are is afunction of how easy it is to use them to build sets of rules that can be reliably appiled to data sources. Right now it's too much work to use those technologies for a project that needs to build a working system in only a few months. As for THREDDS, it seems that it is not a way to blend new metadata into a data source, but a way to carry search metadata about a collection of data sources. A THREDDS catalog is not the place where information about a variable's scale factor should be encoded. -- JamesGallagher - 02 May 2006

Discussion

The client-side AIS is nice because it's so clean from a user's perspective. That is, it's obvious what's being done to the objects returned from the data source. But this architecture has a number of drawbacks. First, it does not scale well when the number of mappings increases because the mapping database is built anew for each virtual Connection. A way around this is to use some sort of persistence, with the most obvious being a RDB (e.g., MySQL). But that makes the clients pretty complex to configure. The mapping DB could be cached on disk using techniques similar to the HTTPCache class' method for managing the HTTP cache. So there are ways around the 'persistence problem' of this design, but they involve some significant increase in complexity on the client-side. Another issue is that this code does not work in browsers; it only works in libdap-built clients. We'll have to duplicate it in the Java and C libraries, and so will every other implementer if they want their systems to get the benefit of the AIS (which we hope they will...). That's a pretty high price to pay for this design. However, we have a client-side AIS implemented already.

Line: 28 to 30

All is not lost. We can use the client-side AIS software to build an AIS server. I think that's where we should go. That way we have the client-side AIS for testing and development and a server for actual deployment. Since a server will probably be stateless, the persistence problem will still need to be solved, but there are a number of ways we can approach it. It might be that after some thought we make the AIS server a Java Servlet, in which case we can make the DB in-memory and constructed when the servlet is loaded into memory. Or we can take the approach that a servlet should use the 'copy HTTPCache' approach. We can keep going with the C++ code base for now and recode in Java once we have something with the correct mix of features. This is appealing since recoding working software is about as easy as development will ever get and we already have a prototype AIS server thing in the Server3.5 code base. This will let us tackle the design problems without mixing them up with a lot of new implementation.

So that leaves some important questions unanswered. What should an AIS Server actually do!

Changed:
<
<
  1. It should return completely massaged responses. Given a URL, it should fetch the information at some DAP URL, look up the URL in a mapping DB, fetch the matching AI and merge the URL's information with the AI. Question: Should the URL be passed to the AIS server (in the Query String?) or should the AIS server make a new set of URLs. This would make the AIS Server's URLs look more 'normal' but might also make it less robust because data sources tend to move around. The AIS Server would need to be updated every time the URLs moved.
>
>
  1. It should return completely massaged responses. Given a URL, it should fetch the information at some DAP URL, look up the URL in a mapping DB, fetch the matching AI and merge the URL's information with the AI. Question: Should the URL be passed to the AIS server (in the Query String?) or should the AIS server make a new set of URLs. This would make the AIS Server's URLs look more 'normal' but might also make it less robust because data sources tend to move around. The AIS Server would need to be updated every time the URLs moved. Updatte: Of course, bundling the data URL in the AIS server URL means that the client has to know where the data are, which will also have issues with robustness.

  1. It should return responses marked to clearly indicate where the information came from.
  2. It should return decent error messages, since now there are two potential points of failure (the DAP URL might be dead as might be the AI URL).
  3. It should provide a web interface for creating AI resources. This will need to support authentication. This might also make it so that the AI URLs are all local, which would boost reliability, although I'm not sure by how much. It would take the design in the directory of a folksonomy, which means that people can contribute much more easily then if they have to set up their own server. The (human) interface will be very important.
Line: 38 to 40

In these diagrams I show both the data and the ancillary information (AI?) as remote from the client and on distinct machines. That might not be the case if we create an AIS that supports a folksonomy where users can 'tag' variables and attributes. In that case we might bundle the result with our new server. Some of the lines in these are missing...

Changed:
<
<
  • This is the AIS as we have currently implementented it. This works only for Attributes at present.
>
>
  • This is the AIS as we have currently implementented it. This works only for Attributes at present. One key aspect of this architecture is that the combination of data and metadata takes place in the client. Each client needs the AIS software, which eliminates generic clients like web browsers. However, this also means that data which match the metadata are easily retrieved from the 'origin server' without a side trip through the AIS server (which would be acting like a proxy server in this case).

client-ais-component.png
Changed:
<
<
  • Here's a design where all of the operations are performed by a server. In this design a URL is sent to the AIS server and the server responds by returning the matching Ancillary information and the client merges them or the AIS server handles the merge by dereferencing the URL and merging the data there. This design will work for web browsers, while the others won't. However, this design results in ugly URLs or requires a more sophisticated configuration. We have a prototype of this but it has weak error handling.
>
>
  • Here's a design where all of the operations are performed by a server. In this design a URL is sent to the AIS server and the server responds by returning the matching Ancillary information and the client merges them or the AIS server handles the merge by dereferencing the URL and merging the information there. This design will work for web browsers, while the other one won't. However, this design results in ugly URLs or requires a more sophisticated configuration. We have a prototype of this but it has weak error handling. One key feature this architecture would more easily support is applying server-side functions to the data (not metadata) returns. This is not supported by our current software, but is conceivable given what we have. This would provide a way to add map variables to data sources (i.e., to handle the case where a variable is 'metadata' and not 'data').

ais-server-component.png
Added:
>
>
Update: Here are two more diagrams which present two different ways of handling the mapping information the AIS needs. In the first one a database (of some sort, not necessarily a RDB) is used to store mappings between the AIS 'virtual' data space, the ancillary information and the data source. In the second figure the URL fed to the AIS holds the information about both of the information sources. All the AIS needs to do is merge them. This is simpler to configure on a URL-by-URL basis, but would be a drag for the user.

* Figure 1. AIS with a data base of mappings between data source and ancillary data URLS:
ais-with-lookup.png

* Figure 2. AIS with both parts of the merge in the URL:
ais-in-URL.png


-- JamesGallagher - 13 Dec 2005

Updated with new diagrams. This does not contain any text about AIS servers getting data and applying server-side functions. Also, consider operations of variable names equivalent to attributes now that both are held in the same response.

Added:
>
>
Suggestion: There are two risks associated with this system that are directly affected by its architecture:
  1. The AIS server is a significantly more powerful component, but there may be performance issues because data are routed through what is effectively a proxy server.
  2. The distributed metadata model may not be reliable enough.

I think the first risk should be investigated right away. I think the second risk should be mitigated by design changes in the AIS. The server should return verbose messages about the operations it's performing and any errors it encounters.


-- JamesGallagher - 02 May 2006

META FILEATTACHMENT AISascurrentlyimplemented.png attr="" comment="" date="1134490330" path="AISascurrentlyimplemented.png" size="10759" user="JamesGallagher" version="1.1"
Line: 55 to 71

META FILEATTACHMENT AnAISServer.png attr="" comment="" date="1134490420" path="AnAISServer.png" size="9936" user="JamesGallagher" version="1.1"
META FILEATTACHMENT client-ais-component.png attr="" comment="Client Side AIS" date="1146550108" path="client-ais-component.png" size="55363" user="JamesGallagher" version="1.1"
META FILEATTACHMENT ais-server-component.png attr="" comment="Server side AIS" date="1146550160" path="ais-server-component.png" size="57330" user="JamesGallagher" version="1.1"
Added:
>
>
META FILEATTACHMENT ais-with-lookup.png attr="" comment="AIS with a data base of mappings between data source and ancillary data URLS" date="1146588513" path="ais-with-lookup.png" size="55357" user="JamesGallagher" version="1.1"
META FILEATTACHMENT ais-in-URL.png attr="" comment="AIS with both parts of the merge in the URL" date="1146588545" path="ais-in-URL.png" size="35913" user="JamesGallagher" version="1.1"

 <<O>>  Difference Topic AISIdeas (r1.8 - 02 May 2006 - JamesGallagher)

META TOPICPARENT WebHome

The AIS

Line: 39 to 39

In these diagrams I show both the data and the ancillary information (AI?) as remote from the client and on distinct machines. That might not be the case if we create an AIS that supports a folksonomy where users can 'tag' variables and attributes. In that case we might bundle the result with our new server. Some of the lines in these are missing...

  • This is the AIS as we have currently implementented it. This works only for Attributes at present.
Changed:
<
<
AISascurrentlyimplemented.png
  • Here's a slight modification, the file that holds the mapping between URLs and Ancillary resources is now stored on a different machine from the client. This change is modest but allows users to share mappings:
    Nowwithremotemappinginfodictionary.png
>
>
client-ais-component.png

  • Here's a design where all of the operations are performed by a server. In this design a URL is sent to the AIS server and the server responds by returning the matching Ancillary information and the client merges them or the AIS server handles the merge by dereferencing the URL and merging the data there. This design will work for web browsers, while the others won't. However, this design results in ugly URLs or requires a more sophisticated configuration. We have a prototype of this but it has weak error handling.
Changed:
<
<
AnAISServer.png
>
>
ais-server-component.png

-- JamesGallagher - 13 Dec 2005

Added:
>
>
Updated with new diagrams. This does not contain any text about AIS servers getting data and applying server-side functions. Also, consider operations of variable names equivalent to attributes now that both are held in the same response.

-- JamesGallagher - 02 May 2006


META FILEATTACHMENT AISascurrentlyimplemented.png attr="" comment="" date="1134490330" path="AISascurrentlyimplemented.png" size="10759" user="JamesGallagher" version="1.1"
META FILEATTACHMENT Nowwithremotemappinginfodictionary.png attr="" comment="" date="1134490371" path="Nowwithremotemappinginfodictionary.png" size="11634" user="JamesGallagher" version="1.1"
META FILEATTACHMENT AnAISServer.png attr="" comment="" date="1134490420" path="AnAISServer.png" size="9936" user="JamesGallagher" version="1.1"
Added:
>
>
META FILEATTACHMENT client-ais-component.png attr="" comment="Client Side AIS" date="1146550108" path="client-ais-component.png" size="55363" user="JamesGallagher" version="1.1"
META FILEATTACHMENT ais-server-component.png attr="" comment="Server side AIS" date="1146550160" path="ais-server-component.png" size="57330" user="JamesGallagher" version="1.1"

 <<O>>  Difference Topic AISIdeas (r1.7 - 06 Feb 2006 - JamesGallagher)

META TOPICPARENT WebHome

The AIS

Line: 23 to 23

Discussion

Changed:
<
<
The client-side AIS is nice because it's so clean from a user's perspective. That is, it's obvious what's being done to the objects returned from the data source. But this architecture has a number of drawbacks. First, it does not scale well when the number of mappings increases because the mapping database is built anew for each virtual Connection. A way around this is to use some sort of persistence, with the most obvious being a RDB (e.g., MySQL). But that makes the clients pretty complex to configure. The mapping DB could be cached on disk using techniques similar to the HTTPCache class' method for managing the HTTP cache. So there are ways around the 'persistence problem' of this design. Another issue is that this code does not work in browsers; it only works in libdap-built clients. We'll have to duplicate it in the Java and C libraries, and so will every other implementer if they want their systems to get the benefit of the AIS (which we hope they will...). That's a pretty high price to pay for this design. However, we have a client-side AIS implemented already.
>
>
The client-side AIS is nice because it's so clean from a user's perspective. That is, it's obvious what's being done to the objects returned from the data source. But this architecture has a number of drawbacks. First, it does not scale well when the number of mappings increases because the mapping database is built anew for each virtual Connection. A way around this is to use some sort of persistence, with the most obvious being a RDB (e.g., MySQL). But that makes the clients pretty complex to configure. The mapping DB could be cached on disk using techniques similar to the HTTPCache class' method for managing the HTTP cache. So there are ways around the 'persistence problem' of this design, but they involve some significant increase in complexity on the client-side. Another issue is that this code does not work in browsers; it only works in libdap-built clients. We'll have to duplicate it in the Java and C libraries, and so will every other implementer if they want their systems to get the benefit of the AIS (which we hope they will...). That's a pretty high price to pay for this design. However, we have a client-side AIS implemented already.

All is not lost. We can use the client-side AIS software to build an AIS server. I think that's where we should go. That way we have the client-side AIS for testing and development and a server for actual deployment. Since a server will probably be stateless, the persistence problem will still need to be solved, but there are a number of ways we can approach it. It might be that after some thought we make the AIS server a Java Servlet, in which case we can make the DB in-memory and constructed when the servlet is loaded into memory. Or we can take the approach that a servlet should use the 'copy HTTPCache' approach. We can keep going with the C++ code base for now and recode in Java once we have something with the correct mix of features. This is appealing since recoding working software is about as easy as development will ever get and we already have a prototype AIS server thing in the Server3.5 code base. This will let us tackle the design problems without mixing them up with a lot of new implementation.


 <<O>>  Difference Topic AISIdeas (r1.6 - 23 Dec 2005 - JamesGallagher)

META TOPICPARENT WebHome

The AIS

Line: 19 to 19

How much of this general task can be done using THREDDS? RDF? OWL?

Changed:
<
<
What about devising a URL syntax so that each item can be addressed as part of the URL. That is, instead of a projection CE, add the variable names to the URL like 'http://test/dap/data/nc/fnoc1.nc/u'. This would provide a way for RDF to attach metadata to each variable since each would appear as a thing in the web.
>
>
What about devising a URL syntax so that each item can be addressed as part of the URL. That is, instead of a projection CE, add the variable names to the URL like 'http://test/dap/data/nc/fnoc1.nc/u'. This would provide a way for RDF to attach metadata to each variable since each would appear as a thing in the web. That said, the existing syntax might be good enough because "... by generalizing the concept of a "Web resource", RDF can also be used to represent information about things that can be identified on the Web, even when they cannot be directly retrieved on the Web."RDF Primer. This doesn't mean that each variable has to be accessible, just name-able.

Discussion


 <<O>>  Difference Topic AISIdeas (r1.5 - 23 Dec 2005 - JamesGallagher)

META TOPICPARENT WebHome

The AIS

Line: 19 to 19

How much of this general task can be done using THREDDS? RDF? OWL?

Added:
>
>
What about devising a URL syntax so that each item can be addressed as part of the URL. That is, instead of a projection CE, add the variable names to the URL like 'http://test/dap/data/nc/fnoc1.nc/u'. This would provide a way for RDF to attach metadata to each variable since each would appear as a thing in the web.

Discussion

The client-side AIS is nice because it's so clean from a user's perspective. That is, it's obvious what's being done to the objects returned from the data source. But this architecture has a number of drawbacks. First, it does not scale well when the number of mappings increases because the mapping database is built anew for each virtual Connection. A way around this is to use some sort of persistence, with the most obvious being a RDB (e.g., MySQL). But that makes the clients pretty complex to configure. The mapping DB could be cached on disk using techniques similar to the HTTPCache class' method for managing the HTTP cache. So there are ways around the 'persistence problem' of this design. Another issue is that this code does not work in browsers; it only works in libdap-built clients. We'll have to duplicate it in the Java and C libraries, and so will every other implementer if they want their systems to get the benefit of the AIS (which we hope they will...). That's a pretty high price to pay for this design. However, we have a client-side AIS implemented already.


 <<O>>  Difference Topic AISIdeas (r1.4 - 14 Dec 2005 - JamesGallagher)

META TOPICPARENT WebHome

The AIS

Line: 14 to 14

  • Server-side, using a local (to the server) configuration file.
  • Server-side, bundled with a data server.
  • Server-side, as a folksonomy, providing a way for users to 'edit datasets.'
Added:
>
>

These four architectures are not exclusive of each other. One implementation might include all four ideas.
Added:
>
>
How much of this general task can be done using THREDDS? RDF? OWL?

Discussion

The client-side AIS is nice because it's so clean from a user's perspective. That is, it's obvious what's being done to the objects returned from the data source. But this architecture has a number of drawbacks. First, it does not scale well when the number of mappings increases because the mapping database is built anew for each virtual Connection. A way around this is to use some sort of persistence, with the most obvious being a RDB (e.g., MySQL). But that makes the clients pretty complex to configure. The mapping DB could be cached on disk using techniques similar to the HTTPCache class' method for managing the HTTP cache. So there are ways around the 'persistence problem' of this design. Another issue is that this code does not work in browsers; it only works in libdap-built clients. We'll have to duplicate it in the Java and C libraries, and so will every other implementer if they want their systems to get the benefit of the AIS (which we hope they will...). That's a pretty high price to pay for this design. However, we have a client-side AIS implemented already.


 <<O>>  Difference Topic AISIdeas (r1.3 - 13 Dec 2005 - JamesGallagher)

META TOPICPARENT WebHome

The AIS

Attributes and Variables

Changed:
<
<
The current AIS implementation is limited to Attributes. Add support for variables. What does that mean exactly?
>
>
The current AIS implementation is limited to Attributes. Add support for variables. What does that mean, exactly?

  1. We need to be able to map the names used in a data source to one or more other names that groups or standards recognize. This means mapping a name like 'u' to 'meridional wind stress.' Question: Is it good enough to do this using attributes, or do we need variable name aliases?
  2. We need to add new variables (and their values). In the best of all worlds, the values would come from a file or a function. This argues strongly for the AIS Server architecture with local ancillary information since such a server can control how the CEs are evaluated.

Possible Architectures

  • Client-side, driven by a local or remote configuration file.
Line: 13 to 16

  • Server-side, as a folksonomy, providing a way for users to 'edit datasets.'
These four architectures are not exclusive of each other. One implementation might include all four ideas.
Added:
>
>
Discussion

The client-side AIS is nice because it's so clean from a user's perspective. That is, it's obvious what's being done to the objects returned from the data source. But this architecture has a number of drawbacks. First, it does not scale well when the number of mappings increases because the mapping database is built anew for each virtual Connection. A way around this is to use some sort of persistence, with the most obvious being a RDB (e.g., MySQL). But that makes the clients pretty complex to configure. The mapping DB could be cached on disk using techniques similar to the HTTPCache class' method for managing the HTTP cache. So there are ways around the 'persistence problem' of this design. Another issue is that this code does not work in browsers; it only works in libdap-built clients. We'll have to duplicate it in the Java and C libraries, and so will every other implementer if they want their systems to get the benefit of the AIS (which we hope they will...). That's a pretty high price to pay for this design. However, we have a client-side AIS implemented already.

All is not lost. We can use the client-side AIS software to build an AIS server. I think that's where we should go. That way we have the client-side AIS for testing and development and a server for actual deployment. Since a server will probably be stateless, the persistence problem will still need to be solved, but there are a number of ways we can approach it. It might be that after some thought we make the AIS server a Java Servlet, in which case we can make the DB in-memory and constructed when the servlet is loaded into memory. Or we can take the approach that a servlet should use the 'copy HTTPCache' approach. We can keep going with the C++ code base for now and recode in Java once we have something with the correct mix of features. This is appealing since recoding working software is about as easy as development will ever get and we already have a prototype AIS server thing in the Server3.5 code base. This will let us tackle the design problems without mixing them up with a lot of new implementation.

So that leaves some important questions unanswered. What should an AIS Server actually do!

  1. It should return completely massaged responses. Given a URL, it should fetch the information at some DAP URL, look up the URL in a mapping DB, fetch the matching AI and merge the URL's information with the AI. Question: Should the URL be passed to the AIS server (in the Query String?) or should the AIS server make a new set of URLs. This would make the AIS Server's URLs look more 'normal' but might also make it less robust because data sources tend to move around. The AIS Server would need to be updated every time the URLs moved.
  2. It should return responses marked to clearly indicate where the information came from.
  3. It should return decent error messages, since now there are two potential points of failure (the DAP URL might be dead as might be the AI URL).
  4. It should provide a web interface for creating AI resources. This will need to support authentication. This might also make it so that the AI URLs are all local, which would boost reliability, although I'm not sure by how much. It would take the design in the directory of a folksonomy, which means that people can contribute much more easily then if they have to set up their own server. The (human) interface will be very important.

Some Deployment Diagrams

In these diagrams I show both the data and the ancillary information (AI?) as remote from the client and on distinct machines. That might not be the case if we create an AIS that supports a folksonomy where users can 'tag' variables and attributes. In that case we might bundle the result with our new server. Some of the lines in these are missing...


 <<O>>  Difference Topic AISIdeas (r1.2 - 13 Dec 2005 - JamesGallagher)

META TOPICPARENT WebHome

The AIS

Line: 12 to 11

  • Server-side, using a local (to the server) configuration file.
  • Server-side, bundled with a data server.
  • Server-side, as a folksonomy, providing a way for users to 'edit datasets.'
Deleted:
<
<

These four architectures are not exclusive of each other. One implementation might include all four ideas.
Added:
>
>
Some Deployment Diagrams

In these diagrams I show both the data and the ancillary information (AI?) as remote from the client and on distinct machines. That might not be the case if we create an AIS that supports a folksonomy where users can 'tag' variables and attributes. In that case we might bundle the result with our new server. Some of the lines in these are missing...

  • This is the AIS as we have currently implementented it. This works only for Attributes at present.
    AISascurrentlyimplemented.png
  • Here's a slight modification, the file that holds the mapping between URLs and Ancillary resources is now stored on a different machine from the client. This change is modest but allows users to share mappings:
    Nowwithremotemappinginfodictionary.png
  • Here's a design where all of the operations are performed by a server. In this design a URL is sent to the AIS server and the server responds by returning the matching Ancillary information and the client merges them or the AIS server handles the merge by dereferencing the URL and merging the data there. This design will work for web browsers, while the others won't. However, this design results in ugly URLs or requires a more sophisticated configuration. We have a prototype of this but it has weak error handling.
    AnAISServer.png

-- JamesGallagher - 13 Dec 2005
Added:
>
>
META FILEATTACHMENT AISascurrentlyimplemented.png attr="" comment="" date="1134490330" path="AISascurrentlyimplemented.png" size="10759" user="JamesGallagher" version="1.1"
META FILEATTACHMENT Nowwithremotemappinginfodictionary.png attr="" comment="" date="1134490371" path="Nowwithremotemappinginfodictionary.png" size="11634" user="JamesGallagher" version="1.1"
META FILEATTACHMENT AnAISServer.png attr="" comment="" date="1134490420" path="AnAISServer.png" size="9936" user="JamesGallagher" version="1.1"

 <<O>>  Difference Topic AISIdeas (r1.1 - 13 Dec 2005 - JamesGallagher)
Line: 1 to 1
Added:
>
>
META TOPICPARENT WebHome

The AIS

Attributes and Variables

The current AIS implementation is limited to Attributes. Add support for variables. What does that mean exactly?

Possible Architectures

  • Client-side, driven by a local or remote configuration file.
  • Server-side, using a local (to the server) configuration file.
  • Server-side, bundled with a data server.
  • Server-side, as a folksonomy, providing a way for users to 'edit datasets.'

These four architectures are not exclusive of each other. One implementation might include all four ideas.

-- JamesGallagher - 13 Dec 2005