DSpace User Group Meeting 2007
  
  Home > Papers > Terry Catapano

    home
    overview
    programme
    call for papers
    submission
    papers
    registration
    organization
    links


  


  


    


  


 
Terry Catapano

Plazi.org: Using DSpace as a Repository of Species Descriptions (Poster)


Plazi.org; Columbia University


American Museum of Natural History


University of Karlsruhe

     Full text: PDF
     Last modified: October 26, 2007

Abstract
The goal of the DSpace installation at plazi.org is to demonstrate how the corpus of texts covering the descriptions of the world's species can be assembled into a digital repository for stable, long term access. In this presentation we will focus on our deployment of DSpace working in combination with a community based text mark up tool (resulting in an XML encoded version of the original scanned or electronically published document) as well as a web service allowing to extraction of individual descriptions from within the body of publications.

The published record of biological systematics, including the descriptions of the world's 1,8 million species has some unique characteristics. The scientific naming of species is regulated by Codes and thus the publications are quasi legal documents. Descriptions remain relevant for a very long time, even if they are complemented by more comprehensive ones. Additionally, access to existing descriptions is vital for the understanding not only the 1,8 million known species, but also of the yet to be described 20+ million. Valid treatments for animals, for example, span back to 1758, and include perhaps more than 10M pages, of which almost all are only available in hard copy. Taxonomic treatments are as well highly structured documents and very rich in data. A wealth of important morphological descriptions and data, geographic distribution data, bibliographical references, and more resides latent in the taxonomic literature


Items in the repository are made up of several files. A PDF is usually available, but in many (given enough time and resources, all) cases another representation of the publication, encoded in the XML schema TaxonX is provided. The encoding opens up the treatments, exposing the data contained within to extraction, data mining, analysis fo r a variety, and other purposes. Since the mark-up process is a slow and expensive and involves the knowledge of the systematics domain, a community mark up server is added, so that interested parties can not only upload new pdf documents, but download and enhance the documents in discrete well defined steps towards valid taxonx documents. Similarly, other applications can build upon the foundation provided by the DSpace repository, such as a search/retrieval interface oriented towards the needs of the Systematics domain, and integrate into the wider and growing Systematics, Conservation, and Biodiversity cyberinfrastructure.

Research
Support Tool
Capture Cite
View Metadata
Printer Friendly
Context
Author Bio
Define Terms
Related Studies
Media Reports
Google Search
Action



    Learn more
    about this
    publishing
    project...


Public Knowledge

  Website by AEPIC - CILEA, powered by OCS                                                     Top