HDF5DataHandler
Background information sources
Data type mappings
James, Please help with this:- Is DAP supporting 64-bit and 128-bit integer? No. We might if there are many users requesting it, but it's better to not support some types if they are infrequently used than make all of the clients support them. Of course, clients can choose to not support them, but making clients hard(er) to write seems to limit use more than not supporting some data. [jhrg 5/10/07]
- Is the datatype size in DAP fixed? For example, is byte always 8-bit and Int32 always 32-bit? [ky 2/22/2007] Yes. [jhrg 5/10/07]
| HDF5 Objects | DODS Objects | ||
|---|---|---|---|
| Dataset | Integer | Integer | Atomic |
| Float | Float | ||
| String | String | ||
| Reference | ? | ||
| Date Time | ? | ||
| Bit Field | ? | ||
| Compound | Array? | Constructor | |
| Group | Group | Structure? | |
| ? | Grid | ||
| ? | Sequence | ||
| Attribute | Attribute | Attribute | Attribute |
Reference to DAP
HDF5 reference includes object and region references. To represent an object reference: A Permeanant HDF5 object ID needs to be stored in DAP To represent a region reference: A Permeanant HDF5 object ID as well as the selection shape needs to be stored in DAP [ky]More about Datatype Mapping
We definitely need to resolve this part. HDF5 Group must be distinguished from the compound datatype when mapping to DAP. In Pydap, HDF5 Group is mapped to DAP structure. There is no way to map HDF5 compound datatype to DAP. Object reference needs to be mapped appropriately to DAP with the suggestions above. [ky] Several sample HDF5 files that help understand the mapping of group,object reference and data region reference can be found under ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/opendap/Samples-for-dap-enhancement A readme file that describes these files can be found on the parent directory: ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/opendap/ The h5dump header output of h5group.h5,h5_objref.h5 and h5_regref.h5 can be found under ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/opendap/Samples-for-dap-enhancement/group/h5group.txt ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/opendap/Samples-for-dap-enhancement/references/h5_objref.txt ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/opendap/Samples-for-dap-enhancement/references/h5_regref.txt There two important reasons to map HDF5 group to DAP: 1) We want to well preserve the attribute information of HDF5 Group in this way. Otherwise, the attribute information inside group may not be easily recognized from the client. 2) It will be easy for a future HDF5 client to retrieve the information and rebuild the HDF5 file. Based on the discussion today, use the example at ftp://ftp.hdfgroup.uiuc.edu/pub/outgoing/opendap/Samples-for-dap-enhancement/group/h5group.txt We propose to map the group hierarchy with following information:
Dataset {
Structure {
Structure {
Structure {
string h5_comp;
} foo2;
string h5_array_1;
} foo_1;
Structure {
string h5_array_1;
} foo1_2;
} "/";
} h5_group.h5;
The string will store the absolute path of the HDF5 dataset(DAP variable) name.
In this way, the attribute information of HDF5 group can be preserved nicely. And the ambiguity caused by HDF5 compound datatype dataset and HDF5 group can be avoided since no HDF5 dataset can have the name "/". However, we do need a careful document for this in the new HDF5 to DAP mapping. Please review this, James. One thing we would like to do is:
Is an empty structure legal? If this is not legal for DAP. Then we have a big problem!!! It is perfectly fine to have an empty group inside HDF5.
No one ever asked that, but I don't see why it would be a problem. However, there may be a better way to handle this case. While every variable must have an attribute container there can be other attribute containers which are bound to no particular variable. [jhrg 5/10/07]
Dataset {
Structure {
}foo3;
};
If an empty DAP structure is legal, I think we get this problem solved. Otherwise, you do need to think to create a new object inside DAP to support this; perhaps.
[ky 2007-5-3]
What's left
Need to obtain typical sample NASA files and opinions from NASA users. [ky] This has been done, see the above.[ky]Implementation language
C++: Build on the existing HDF5 handler Python: Use PyDAP and PyTables. What types of HDF5 files could this not read? What capabilities would it support that the regular HDF5 library does not? How to Integrate Python-based handlers into the Server4 BES framework? Check the answer of the first question below at the pros and cons of python-server. Here's some information about using Python from C++:- The Boost.Python library can be used to embed Python in C++. There's also a SIG with an active mail list.
- There's info about C++/Python mixing at python.org, as well.
- Interfacing the PyDAP/PyTables HDF5 plugin to the BES
- Extending PyTables to read more of HDF5
- Improving efficiency in PyDAP
- Python is very clean and easy to use language.
- Both Pydap and Pytables are easy to build in a sense.
- Pydap-HDF5 plugin can serve HDF5 data with limitations.
- Pydap, Pytables continue gaining popularities,even among NASA users
- Both developers are quick to reply
- Pydap-HDF5 plugin is very concise(3 pages in total)
- It is very possible that DODS sequence can be supported by HDF5 via Pytable. This needs some efforts with C++ APIs. The structure of DAP sequence is exactly like struct. However, DAP explicitly demands the sequence to be like a relational database and it depends on the DAP Data handler to *perform the sequence data like a query to the relational database*(called field projection in DAP). Pytables provides the possibility of this promise through its powerful indexing feature. See the selecting value example HowToUse Pytables.
- Pydap is python implementation of DAP, can its performance compete with OPeNDAP
?
- Pydap-HDF5 plugin needs more work,
Currently it doesn't support
- HDF5 compound datatype,
- array datatype,
- object reference
- variable length
- some atomic type like char
- group attributes
- Pytables doesn't support reference. Unlike Pydap, if we ask Pytables people to support reference, extra fundings may be required.
- It involves more packages(Pytables,numpy) than pure HDF5 C++ handler.
- How stable will the pytables, python handler and pydap be?
- Another question added by Mike is that how difficult a C client to use Pydap?
- We know that Python is a language targeted at rapid development but which is not as fast as C/C++ when dealing with large amounts of array data. We also know that's exactly the type of data the handler will have to read/process. (Handler performance with large amounts of data)
- There are unknown issues in interfacing our C++ server framework (aka Server4, aka the BES) to python and moving DAP objects across that interface. So there's significant risk and significant development costs for OPeNDAP (since OPeNDAP is the right group to do that work). However, OPeNDAP has dropped its level of participation and I think it's best to save as much of that money as we can for modifications to the DAP and to provide general support to THG (see plan for some potential risks the project will need to address). (Unworkable cost distribution).
Client support
- Ocapi?
- NetCDF CL
Sample Data
Data files: We're initially thinking of making the handler work with AURA and NPOESSS data files. Is this a reasonable place to start? If so, we need sample files. If we're going after other groups of files, then lets get samples of those too. I have several Aura sample files and one NPOESS sample file. Kent will also contact with potential NASA users for the typical sample files.[ky] Kent contacted with several NASA people and received one reply from Bruce Vollmer at NASA GSFC GES DISC.See the attached. The sample HDF5 files through OPeNDAPPerformance testing
- What are the criteria of performance related to DAP-HDF5 applications? We need to have at least several areas that are most interested to users to focus on our testings and improvements.
Testsuite development
The how part of testsuite can follow the same methodology HDF4 used. By using 'make check', the expected outputs from h5 handler will be compared against the pre-written outputs. Another verification method is to use visualization tools(e.g. ferret) that can act as OpenDAP clients. This will ensure that users who are familiar with h5 data can visually examine the correctness of output quickly. The what part of testsuite is still under investigation. The good starting point will be re-using the hdf4 testsuites by converting them into h5 format. In this approach, the first question is how extensive and valid the hdf4 testsuite is. The second question is the expected output files underhdf-testsuites directory cannot be directly used as is due to the h4toh5 conversion program. I ran the conversion program on hdf4 testsuite files and compared the results of hdf4 and hdf5 handlers, they are quite different.
Here is the URL for quick comparisons: http://hdfdap.hdfgroup.uiuc.edu:8080/
Aura EOS5 data support
Aura data support DAP grid data. So we MUST FIND a WAY to map Aura HDF-EOS5 (Grid,Swath,Point) Data to DAP (Grid,Swath,Point) Data using only HDF5 APIs. The problem can be described as follows: HDF-EOS5 data à Using HDF5 library to retrieve all information à Mapping to DAP correctly. For example: (HDF-EOS5 grid) à (HDF5 without geolocation APIs) à DAP Grid correctly All HDF-EOS5 grid geolocation information are put inside an internal HDF5 group called structMetadata. One unclear point is how different projection can be accepted by DAP. Step to tackle this problem: FOR JAMES, PLEASE HELP: How HDF4 handler works with HDF-EOS2 data since swath, grid and point are not new concepts. Please provide us some hints on this. Specifically: which part of code we should read and related documents we can read. [ky 2/22/07]-- Main.muqun - 08 Mar 2007 -- JoeLee - 08 Feb 2007 -- JamesGallagher - 17 Nov 2006 -- JamesGallagher - 21 Dec 2006

