Senckenberg Research Institute and Museum, Frankfurt

Botanical Garden and Botanical Museum Berlin-Dahlem


TDWG 2000: Digitising Biological Collections
Taxonomic Databases Working Group, 16th Annual Meeting
Senckenberg Museum, Frankfurt, Germany, November 10-12, 2000


Mary McGee Wood*, Susannah Lydon*, Robert Huxley** & David Sutton** 

MultiFlora: Automatic compilation of accurate taxonomic databases from multiple non-computerised sources

* Department of Computer Science, University of Manchester, UK. 
** Department of Botany, The Natural History Museum, London, UK. 

[Poster presentation] 

Floras hold vast resources of botanical data, locked in multiple overlapping natural language texts. MultiFlora aims to provide proof  of concept that by applying "Information Extraction" techniques to parallel descriptions of a taxon and correlating the resulting partial datasets, we can derive a usefully complete and accurate description.

Initial hand analysis has produced three-dimensional data matrices (Flora x characters x species ) for five species of  Ranunculus, across six Floras. Variations in terminology, and in use of mean values or ranges, are common but genuine disagreements are rare.  The GATE system (University of Sheffield) will be used to provide automatic IE, and correlation heuristics will be implemented. 


This meeting was co-sponsored by the Committee on Data for Science and Technology (CODATA) 

