Venue: Royal Botanic Garden, Sydney, in the framework of the Biodiversity Knowledge Management Forum conference in Sydney.
Attendants: Nicolas Bailly, Walter G. Berendsohn, Stanley Blum, Alex R. Chapman, Barry J. Conn, Charles Copp, Jim Croft, Marc Geoffroy, Philip Gleeson, David G. Green, Stinger Guala, Anton Güntsch, Norman F. Johnson, Yde de Jong, Ross Mathews, Robert A. Morris, Ben Richardson, Adrian Rissone, Sabine Roscher, P. J. Schwartz, Kerstin Teske, David Vieglais, Greg Whitbread
Apologies received: Lois Blaine, Kurt Bollacker, Raul Jimenez Rosenberg, Rudolf May, Derek Munro, Paula Ross Huddleston, Hideaki Sugawara, Neil Thomson, John Wieczorek
[A report written by Charles Copp for the ENHSIN group is available via http://www.nhm.ac.uk/science/rco/enhsin/Xmlreport.doc (MS Word document)]
The purpose of this meeting was to:
Schema development so far
Charles Copp presented an initial schema he developed within the last three
weeks funded by ENHSIN (NHM) (http://url.des.schemas). The schema is mainly a
conversion of the DTD produced by first workshop in Santa Barbara (http://www.bgbm.org/tdwg/codata/SBWorkshop.htm)
extended by elements from the BioCISE information model (http://www.bgbm.org/biodivinf/docs/CollectionModel/)
and the British NBN/Recorder model . Charles pointed out that data transfer
efficiency could be improved in many cases if the GatheringEvent were the root
concept of a hierarchical data structure (with CollectionUnits as children of
the GatheringEvent), either as an alternative to or instead of the structure
that uses the CollectionUnit as the root concept. This would obviate the need to
transfer redundant data in cases where many specimens were collected in the same
gathering event. Nevertheless, the group decided to stay with the structure that
uses CollecitonUnit as the root concept for two reasons:
(1) efficiency is not an important design goal of a semantic standard (clarity, universality, completeness, and simplicity, for example, should be given higher priorities);
(2) collection databases implemented as flat data structures (a large number) won't easily be able to export a hierarchical dataset with a normalized gathering event as the root concept and therefore won't be able to participate in a federation based on this alternative. (We think it will be easier for systems based on a hierarchical structure to export a flat version of their contents.)
Several additional elements were noted and will be included in the next version of the schema.
Future work on the schema
From now, the schema will evolve through the work of exports on specific "subschemas" such as botanical names or geography. To do so, the schema will be published and maintained on the BGBM server in a way that makes it understandable for less "XML experienced" experts. Element definitions will consist of the following data items:
- Name in existing standards
ElementContent [type and domain]
A turnaround of 30 days after Request for Comment considered to be appropriate. The group agreed on the following rules for the future development of the schema:
Bob Morris offered to revise the schema according to the fulfillment of syntactic requirements.
Sub schemas and coordinators identified so far:
|Zoological names||Yde de Jong|
|Botanical names||Walter Berendsohn|
|Bacteriology||Lois Blaine (to be confirmed)|
|Geography||Sabine Roscher (to be confirmed)|
It was not decided whether or not mineralogy should be included in the schema. To provide a "slot" for minerals, the schema should include an abstract type for mineralogy. Walter Berendsohn will approach experts for the revision of schema parts not yet identified.
Stan Blum, PJ Swartz (portal software) and Dave Vieglas (provider software) introduced the development of DIGIR (Distributed Generic Information Retrieval), an open source reference implementation (http://digir.sourceforge.net/) for the query protocol being developed by the "grand scheme" subgroup. A design objective in the current work is to decouple the protocol, software, and semantics. One benefit of this decoupling would be to make it easier to evolve or version the federation schema. As long as a stable schema for collection data is not available, the system will use the Darwin Core Version 2 as a simple example. If providers of collection data register their service on a UDDI server (http://www.uddi.org/), they will not have to inform each portal individually. Portals can poll the UDDI server periodically to discover new providers. Protocol compliant queries will be transmitted to data providers as XML documents. Result sets will also be returned to portals as XML documents. The reference architecture will probably require each provider to publish a "meta data" data set (i.e., collection level data) that will help intelligent portals determine which providers need to be queried to answer a particular user request. The content of meta data items will be discussed on the basis of results of the BioCISE project (http://www.bgbm.org/BioCISE).
The organizers gratefully acknowledge support from the following organizations:
Working Group Homepage | TDWG Accessions Subgroup Homepage | CODATA | TDWG
Page hosted by the Department
of Biodiversity Informatics and Laboratories of the Botanic
Garden and Botanical Museum Berlin-Dahlem. DISCLAIMER
Page editor: Walter Berendsohn (w.berendsohn [at] bgbm.org).
This page last updated: 06.03.2005