Tennessee Leeuwenburg
29 Jan 2004
DODS is an acronym for Distributed Oceanographic Data System. It is used interchangably with serveral server frameworks which perform the same task and use the same application protocol. The homepage for DODS is http://www.unidata.ucar.edu/pacakages/dods/index.html.
DAP/2.0 is the application protocol that a DODS server and a DODS client use for the communication and transmission of data. It uses the the http network protocol, and uses a mixture of path, file extension and POST variables to define its' language. It appears from the internals that POST and GET variables are both supported, but a mixture is not recommended.
Anagram is a collection of java classes, some of which are Java Servlets. The anagram server sits behind Tomcat (or potentially another servlet container) and handes DODs requests.

The central Anagram servlet has “/” mapped to it. This means that it can be the only servlet running on that port. However, if you are customising it yourself, it is a simple task to move it to /anagram/ or /dods/ to allow it to function alongside other servlets you may have installed on the machine. The first piece of logic is the Dispatcher, as outlined in the above diagram. However, this logic is NOT reflected in the class design. The 'handler' logics outlined above are called “Filters” by the Anagram developers. The dispatcher routes incoming requests to the various filters on the basis of the file extension. However, only the .dds, .das and .dods extensions have their own filters. The .asc, .info, .ver and .html filters are all subsumed by other classes within the anagram framework. This means that when customising the anagram server for your own purpose, the only modification is to the three primary filters.
To customise to your data source, you will need to change the three primary filters. This means implementing the relevant anagram interfaces with your own classes, integrating your new classes into the anagram server, and then implementing the correct functionality for your needs. All of the user modifications actually reside in the Tool class – everything else can remain untouched.
Out of the box, the class AnagramServlet is started as listening to all requests for “/”. It isn't mentioned, but autodeploy is set to true, so it is started by default. The AnagramServlet then creates the filters { AnalysisFilter, AbuseFilter, OverloadFilter, DispatchFilter } and initialises each of them. It also creates a new Server instance. The Server is something which all, or most, classes maintain a reference to. It handles locking and synchronisation issues, is used by the AnagramServlet to create ClientRequest objects from incoming HTTP requests, and triggers the creation of the services {AdminService, ASCIIDataService, DASService, DDSService, DirectoryService, BinaryDataService, HelpService, InfoService, UploadService, XMLCatalogService }. Through the interaction of these classes, the HTTP request is parsed, processed, and passed to the AnalysisFilter, along with the Service that is needed by the Dispatcher to handle the response.
However, the various filters and services all make calls back to the Tool class. This is what you will modify when interfacing to your custom filesystem, database, etc. Filters and services all work with Java Streams, which are obtained from the Tool class. The “example” directory of anagram contains some barely functional, moderately documented classes. The org/iges/grads/server directory of gds contains the classes which were written to create that server. They implemented Tool with the GradsTool class. The other classes in this directory appear more or less to be classes associated with the proper functioning of GradsTool.

There is no reason why Anagram couldn't work with any servlet container. I has installed Tomcat 5 on my system, and found that this conflicted somehow with Tomcat4, preventing me from running the gds server. As well as customising anagram to serve data from my unique database, I also wanted to upgrade to Tomcat v5, and also alter anagram to listen only for specific paths so as not to conflict with other servlets that I had under development. Basically, all that needs to be done is to copy across the appropriate webapp directories, and add the servlet to the server.xml file. Instead of autoloading all applications, I would recommend adding it manually to the server.xml file.
In this section I will outline my experiences of trying to change the functioning of the anagram server to work with tomcat v5, and also to change the server startup etc so that it is run in the same way as other servlets, allowing it to run alongside other servlet in a standard tomcat installation.
This will expand a tomcat v4 installation into the anagram-1.0 subdirectory, along with the sources for the anagram servlet framework. You will need to run src/makejar to compile the necessary files and place them in a jar file in the lib directory of the web application.
<Context path="/dods" docBase="dods" debug="0" reloadable="true" />
The anagram servlets expect to find environment variables specifying the logging output files and diretories. In the default anagram installation, this is set in one of the scripts which is used to initiate the tomcat server. The easiest way to get anagram working is to modify your existing tomcat scripts to include these same settings. Anagram expects to have a home directory above the tomcat directory. You will need to create a home directory for Anagram, and then fool it into working with the new directory structure.
This is quite intricate and hard to get right. The first change I made was to alter the way that the environment variables were set/read. Server.java uses ServletConfig.getInitParameter(key) to read in a number of variables. It appears that it does not distinguish between Unix environment variables and the servlet parameters contained in the web.xml file for the web application. For this reason, I chose to specify the parameters inside the web.xml file as being more compatible with other servlets and better design.
The second change that needs to be made is to the server's configuration file. This is specified in the web.xml file, so you can call it whatever you please. This is also where you set the class name for Tool.java, allowing you to specify your own implementation of this important class.
In my environment, tomcat is installed in /opt/tomcat. I created a new directory, /opt/anagram to function as the anagram home directory. I wanted to keep the anagram configuration files outside of the public directory tree. I created the directories bin, conf, log and temp. I then copied the example.xml file into /opt/anagram, and renamed it to suit my application name. At some point I would like to move this to the conf directory which I created, however that would require a code change, which I was not ready to do at the time.
It should be noted that none of the scripts used by anagram to start tomcat 4 have been copied. The maintaining of the tomcat server is a separate task, and I tried to avoid having to modify the default scripts. I don't think this will cause problems. However, it's possible that somewhere in the anagram framework is a dependancy on the existence of some of these files which I am unaware of. Some clever use of grep could probably sort this out, but it hasn't been done yet.
For my purposes, there were three tasks. First, I had to understand how the DODS data model worked. Then, I had to work out an appropriate mapping to the MARS database model. Secondly, I had to create an appropriate wrapper for the MARS client utility, which was written in a mixture of FORTRAN and C, and could not be translated into Java. Thirdly, I had to modify Tool.java to communicate and translate the requests made of it to the MARS client utility.
DODS concieves of data as being in a number of discrete data sets, each of which can be subqueried to produce a data subset of variables. The data set is the most general collection of variables, and the constraint is applied to the dataset to retrieve the relevant variables. In SQL terminology, this would be like ( select <list> from <dataset> where <criteria> ) except that the list of variables is included with the criteria for the query. An unconstrained query to a data set should return all variables for all constraint conditions.
The MARS data model is a tree structure – datasets are not homogenous, and all requests must be fully specified. For example, a different “month” of data within the same dataset might have different properties to any other month. This makes it difficult to know how to best produce the list of data sets. The translation is going to have to be done by convention, rather than by rule. That is, some agreement will have to be reached about what consititutes a data set. A fully unconstrained query to MARS would, I think, start to return ALL data stored in the mars database – although I believe there is some code control in the client which prevents this kind of thing from actually happening. More likely, some arbitrary decision based on operational use will be made.
This should be fairly straightforward. For basic functionality, all that will be required is to more or less append the constraint expression to the dataset identifier, and present it to the MARS client utility in the appropriate syntax.
Producing a compiled version of the MARS client might be a bit fiddly, as little work has been done to make this easily compatible. It is mostly used from central servers where they were installed once, without the real intention of being installed in many locations. Some work needs to be done to strip out nonessential library includes, simplify the makefile etc to produce an easily relocatable version for installation on the dods server
The anagram framework is built expecting all data to be files in a filesystem. That is, that all data can be fully specified by a directory path. In database terms, you could think of each directory path as being a primary key to a dataset. The DataHandle object is an abstraction of this pathname. However, the DataHandle is more general than needing to specify a pathname. It has a field which is of type object, called toolInfo. The array of DataHandles is created initially by Tool.java. It seems likely that the toolInfo field is simply to hold whatever information about the data that Tool.java requires to perform the requests made aganist it.
So, the functionality of the program is such that Tool.java builds a collection of DataHandle's. This collection is then used by the anagram framework when respoding to info, html queries etc. The anagram framework searches for data by using the completeName field. This is what is forced to be in pathname form. So, you need to find a way of describing your data which looks like a pathname. If you wanted, you could name all your data sets as /dataset. However, you could also name them /datefield/typefield/datetypeInstance. Anagram uses the completeName field to generate a pseudo-filesystem for browsing and querying data. A DataHandle associates this identifier with a creation timestamp, a description and an “Object” which Tool.java may use to store extra data, or whatever, relating to a dataset. Whenever Tool.java is asked to do any work on a dataset, it is given back the DataHandle relevant to the request.
What this means, however, is that once created, the DataHandle collection is immutable. It may be possible to change this behaviour, but it would require a code change to the internal functioning of the anagram framework. So, you are stuck with a static dataset until such time as you restart the server. This will trigger a re-reading of the data, and any new data sets can be added. This is of particular relevance to the task at hand, because new model output will very frequently be added to the MARS database. At some point it will likely be necessary to change this behaviour.
Writing the DataHandle[] doImport(Setting setting) {} method required that either a maintained configuration file be created and parsed to create the collection, or than it be generated on-the-fly through a database query. Unfortunately, the MARS server doesn't have any reflection capacity, so there is no way of interrogating it to discover the form of its' data sets. This means that a configuration file will need to be created.