免费文献传递   相关文献

Botanical Literature Goes Global: The Biodiversity Heritage Library

Botanical Literature Goes Global: The Biodiversity Heritage Library



全 文 :Botanical Literature Goes Global: The Biodiversity
Heritage Library
Judith A. WARNEMENT*
(Botany Libraries, Harvard University Herbaria, 22 Divinity Ave., Cambridge, MA, USA)
Abstract: Scholars in the natural sciences rely on historic literature more than any other branch of science. Yet
much of this material has limited global distribution and much of it is available in only a few select libraries. This
wealth of knowledge is available only to those few who can gain direct access to significant library collections, a situ鄄
ation that is considered one of the chief impediments to the efficiency of research in the field. Community support
and new technologies led to the formation of the Biodiversity Heritage Library. The BHL is an international collabora鄄
tion of natural history libraries working together to make biodiversity literature available for use by the widest possible
audience through open access and sustainable management.
Key words: Biodiversity Heritage Library; Encyclopedia of Life; Botanical libraries; Digital libraries; Taxonomic
intelligence; Taxonomic literature
CLC number: G 25摇 摇 摇 摇 摇 Document Code: A摇 摇 摇 摇 摇 摇 Article ID: 2095-0845(2011)01-039-07
Background
The biodiversity community is at the forefront of
developing international standards and applying new
technologies to merge and expand historic datasets
with current research, as exemplified by the Biodi鄄
versity Heritage Library (BHL) program. The idea
for this project started in March, 2005, as scien鄄
tists, informatics experts, and librarians convened at
a session entitled “Libraries and Laboratories冶 hos鄄
ted by the Natural History Museum in London to
share ideas, goals, and concerns. One outcome was
a shared vision to build an integrated digital biodi鄄
versity library modeled after Botanicus, Missouri Bo鄄
tanical Garden忆s digital library. A follow鄄up organi鄄
zational meeting was hosted by the Smithsonian Insti鄄
tution Library (SIL) in June of 2006, where librari鄄
ans from major natural history, botanical garden,
and research institutions in the United States and
Great Britain were invited to participate in a consor鄄
tium that would build a global digital collection. All
of the participants agreed to move forward, with the
Missouri Botanical Garden agreeing to support the
development of the technical infrastructure. By Feb鄄
ruary of 2007 a formal organizational meeting was
hosted by Harvard忆s Museum of Comparative Zoolo鄄
gy, where the governance structure, operational
plans, and working committees for the BHL were
formed. Charter members included natural history
museum libraries ( American Museum of Natural
History; Field Museum; Natural History Museum,
London; and Smithsonian Institution ), botanical
garden libraries (Missouri Botanical Garden; New
York Botanical Garden; and Royal Botanic Gardens,
Kew), as well as academic and research libraries
(Harvard University忆s Botany Libraries; Ernst Mayr
Library of the Museum of Comparative Zoology; and
Marine Biological Laboratory / Woods Hole Oceano鄄
graphic Institution Library). Libraries representing
the Academy of Natural Sciences ( Philadelphia )
and California Academy of Sciences joined in 2008.
The Biodiversity Heritage Library portal ( www.
biodiversitylibrary. org; see Fig. 1 ) was officially
植 物 分 类 与 资 源 学 报摇 2011, 33 (1): 39 ~ 45
Plant Diversity and Resources摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 DOI: 10. 3724 / SP. J. 1143. 2011. 10239
* Author for correspondence; E鄄mail: warnemen@ oeb. harvard. edu; phone (617) 495-2366; Fax (617) 495-8654
Received date: 2010-12-13, Accepted date: 2010-12-30
launched in May, 2007, with Botanicus as the under鄄
pinning of a rapidly expanding biodiversity library.
Fig. 1摇 The Biodiveristy Heritage Library portal
Materials and methods
Partners in the Biodiversity Heritage Library
(BHL) program are working together to digitize the
published literature of biodiversity held in their re鄄
spective collections, and to make that literature a鄄
vailable for open access as a part of a global “biodi鄄
versity commons冶. The BHL program is the scan鄄
ning and digitization component of the Encyclopedia
of Life (EOL) (www. eol. org), Harvard Universi鄄
ty忆s Professor Edward O. Wilson忆s vision to create a
web page for every species of the earth忆s biota. An鄄
other key collaborator is the Internet Archive ( IA)
(www. archive. org), which is dedicated to “univer鄄
sal access to human knowledge冶 and provides most
BHL partners with low cost mass scanning, archival
storage of files, image processing, and technology
development. Scanning facilities have been opened
in London, Boston, New York, and Washington,
D. C. to assist with the BHL and other scanning
projects. The IA also allows the BHL to “ ingest冶
other natural history content contributed by non鄄BHL
partners like the California Digital Library, the Uni鄄
versity of Illinois at Urbana鄄Champaign, the Univer鄄
sity of Toronto, and the Boston Library Consortium.
The partnership enriches the BHL collection and le鄄
verages limited scanning dollars.
The Biodiversity Heritage Library is not a legal
entity, but a federation of libraries bound by memo鄄
randa of understanding. Members have signed agree鄄
ments and the library directors represent their re鄄
spective institutions on an Institutional Council. An
elected Executive Committee conducts routine busi鄄
ness, and works closely with the three salaried posi鄄
tions that include an executive director, a technical
director, and a collections coordinator. There are
weekly teleconferences scheduled by the Executive
Committee, monthly calls scheduled with the Institu鄄
tional Council, and there is one face鄄to鄄face meeting
held each year. These two groups oversee policy and
funding decisions, while the details are managed by
a variety of broadly representative committees. The
scanning staff members teleconference weekly by
phone and have been instrumental in developing the
tools that manage bidding, workflow, and quality
control protocols. A collections committee monitors
the overall cohesiveness of BHL content, refines in鄄
gest criteria, and reviews all collections鄄related is鄄
sues. An active technical committee designs all as鄄
pects of the BHL global infrastructure, explores and
engages in partnerships that will advance the project忆
s mission. The project is supported by the Encyclo鄄
pedia of Life budget with grants from the John D.
and Catherine T. MacArthur Foundation and the Al鄄
fred P. Sloan Foundation, funds from EOL忆 s five
anchor institutions, the partner institutions, and oth鄄
er grants.
The BHL consortium is working with the global
taxonomic community, rights holders and other inter鄄
ested parties to ensure that this biodiversity heritage
is available to all and contributes to the International
Convention on Biological Diversity ( CBD) and the
Global Biodiversity Information Facility ( GBIF).
“Taxonomic intelligence冶 is the inclusion of taxo鄄
nomic practices, skills and knowledge within infor鄄
matics services to manage information about organ鄄
isms. Dubbed the Universal Biological Indexer and
Organizer, or uBio, BHL is using a sophisticated al鄄
gorithm to locate likely name strings in OCR text,
04摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 植 物 分 类 与 资 源 学 报摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 第 33 卷
has “ discovered 冶 10. 7 million name strings in
NameBank (Fig. 2) (www. ubio. org / index. php? pa鄄
gename =namebank), and serves as a name thesaurus.
The link between the name service and the BHL col鄄
lection creates a powerful new tool for scholars.
Fig. 2摇 NameBank portal
Systematists and taxonomists need access to the
historic literature to support current research. The
cited half鄄life of publications in taxonomy and the
“decay rate冶 are longer than in any other scientific
discipline so the “ current 冶 biodiversity literature
spans more than 250 years. At the outset of the pro鄄
ject, the BHL needed to calculate the scope of the
biodiversity domain. OCLC (www. oclc. org / us / en /
default. htm), the major international library utility,
supported the BHL by merging all of the partners忆
catalog records into a database so OCLC忆s collection
analysis tool could be used to profile the overall col鄄
lection. The outcome showed that biodiversity litera鄄
ture is represented by 1. 3 million catalog records.
More than 800 000 records describe monographs and
40 000 records describe journal titles, with 12 500
records representing current titles. About forty per鄄
cent of the material was published prior to 1923,
generally placing it the public domain. Sixty鄄three
percent of the records were for works in English,
and German was the second most frequent language
(9% ). The BHL忆s scanning efforts have focused on
the pre鄄1923 content that is not readily available,
and yet essential to taxonomists忆 research.
The technical team developed tools to coordi鄄
nate scanning efforts and avoid duplication. A mer鄄
ged database of members忆 serials holdings was crea鄄
ted as a “bid list冶 so that each library can indicate
the titles it intends to scan. If problems are discov鄄
ered, such as missing volumes, or pages, or illus鄄
trations, then a call goes out to other libraries to
scan those volumes. Monographs are selected by
each library along subject areas, and the BHL col鄄
lection is checked prior to scanning to avoid duplica鄄
tion. All items are barcoded and shipping manifests
are created using a tool called WonderFetch (biodi鄄
versitylibrary. blogspot. com / 2008 / 06 / wonderfetchtm鄄
ia鄄metaxml鄄fields. htm). The partner libraries can
populate fields with data that would not normally be
populated as part of the standard IA process, and
then store those values alongside each scanned item
in the IA repository. The impetus for implementing
WonderFetch was not just to automate the inclusion
of essential data elements like the volume and issue
information for serials, but to also capture due dili鄄
gence, rights, and licensing information related to
each item. Partner libraries underwrite all of the
costs associated with identifying, processing, and
shipping materials, and BHL grants support the costs
associated with scanning and digital processing.
Results and Discussion
The BHL portal currently offers more than 44
000 titles represented by nearly 86 000 volumes de鄄
livering more than 32 million pages of content. Users
can search by simply browsing by author, title, or
subject, or can use the novel language, year of pub鄄
lication, and source map options (Fig. 3). More re鄄
fined searches can be achieved by using the search
box that allows the user to search for a specific au鄄
thor, title, subject, or species names. It is the de鄄
livery of search results that is unique to BHL. Spe鄄
cies names results are delivered as a bibliography
that cites the source title, author, date, and pages,
and includes a link to the NameBank record. The
141 期摇 摇 摇 摇 Judith A. WARNEMENT: Botanical Literature Goes Global: The Biodiversity Heritage Library摇 摇 摇 摇 摇 摇 摇
pages of any volume selected are automatically
scanned by the uBio search feature for species
names. The results appear in the “Names on this
page冶 box on the lower left鄄hand corner of the
screen ( Fig. 4). Links to EOL species pages are
highlighted, and clicking on any of the discovered
names will generate a species name bibliography.
Searchers can click on any name in the uBio box to
see a bibliography of all other occurrences of the
name in the entire portal. For example, selecting
Rhododendron indicum will generate the bibliography
shown in Fig. 5 that includes links to all source ma鄄
terials.
The “Download / About this book冶 tab appears
(Fig. 6) when a title is displayed. Users are able to
download the bibliographic record, selected pages,
images, or the entire volume. The menu also fea鄄
tures PDF or OCR download options, and links to
views via other portals. In order to download select鄄
ed pages, users supply an email address, and a cita鄄
tion for the request, and then select up to one hun鄄
dred pages. These documents are retained when ap鄄
propriate metadata is provided and are made availa鄄
ble to other users through CiteBank (citebank. org).
Citebank (Fig. 7) is still under development, but in
addition to saving the BHL鄄selected documents, it is
intended to provide robust search and browse capa鄄
bilities to biodiversity publications stored in multiple
international repositories and aggregate content from
as many systems as possible, so that biodiversity
Fig. 3摇 Portrait and title page in Ernest H. Wilsons A Naturalist
in Western China. (London: Methuen, 1913)
Fig. 4摇 Species names discovered by uBio appear in box in lower
left corner. Note the links to EOL species pages
Fig. 5摇 uBio generated bibliography for Rhododendron indicum Fig. 6摇 BHL Download / About this book tab
24摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 植 物 分 类 与 资 源 学 报摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 第 33 卷
researchers have a single point of access to published
materials. CiteBank is also intended to provide a
storage platform for articles and documents that are
digitized, but not yet online and offer a common sys鄄
tem for researchers to share specialized bibliogra鄄
phies. Users will be able to upload, edit, and share
their own personal lists of references and citations,
and these references will be linked to scanned con鄄
tent in the Biodiversity Heritage Library portal. In
addition, BHL offers to scan professional societies忆
publications or other publishers content at BHL忆s ex鄄
pense and integrate the content into the BHL. A鄄
greements with nearly forty publishers have added a鄄
bout one hundred titles to the collection.
The Biodiversity Heritage Library wiki (Fig. 8)
(biodivlib. wikispaces. com) presents a wealth of in鄄
formation about the project, detailed instructions and
tutorials for using the various features, and lists of
members and BHL staff. Developers忆 tools are de鄄
scribed and documented, and the BHL忆 s opt鄄in
copyright feature is explained. The BHL also uses
popular social media tools to connect with the pub鄄
lic, including Twitter, Facebook, and the BHL blog
( biodiversitylibrary. blogspot. com). Each site at鄄
tracts and supports a varied community of scientists
and the general public.
Interest and support in the BHL has grown at an
astonishing rate. In less than five years the BHL has
grown into an international partnership that mirrors
the global nature of biodiversity research. Formal
BHL agreements are in place in Europe, China, and
Australia, and there is strong interest in South A鄄
merica and Egypt. In Europe, colleagues at twenty鄄
eight European institutions have obtained funding
from the European Union eContentplus Program to
establish a BHL鄄Europe ( www. bhl鄄europe. eu ),
which is developing the technical infrastructure and
tools to deliver content from many scanning projects
throughout the continent. In the United Kingdom
work is also proceeding via the BHL and Europeana
(Fig. 9) (www. europeana. eu / portal) . In China,
the Chinese Academy of Sciences supports BHL鄄Chi鄄
na (Fig. 10) (www. bhl鄄china. org / cms / en), and
the Internet Archive installed a scanner in Beijing in
the summer of 2010 to help build the BHL鄄China
collection. In Australia, The Atlas of Living Aus鄄
tralia, funded by the Australian government忆 s Na鄄
tional Collaborative Research Infrastructure Strategy
program joined BHL in June 2010 (www. ala. org.
au). Additional partnerships, policies, tools, and
tutorials are being explored and developed to refine
the BHL to increasingly extend its global reach.
A great deal has been achieved through conven鄄
tional mass scanning technologies and practices, but
a significant portion of early biodiversity literature is
quite rare and valuable, sometimes fragile, and of鄄
ten the book is too large or has folded maps or illus鄄
trations that do not fit on conventional scanning
beds. A planning grant, Retooling Special Collec鄄
tions in the Age of Mass Digitization, awarded by the
Institute of Museum and Library Services (IMLS) in
2008 allowed BHL partners to identify and develop a
cost鄄effective and efficient large鄄scale digitization
workflow and to explore ways to enhance metadata
for library materials that are designated as “ special
collections. 冶 The group held a series of meetings,
communicated by email, and established a wiki to
record meetings, track progress, and share docu鄄
ments about costs, statistics and workflows, and
small鄄scale scanning tests. The report included ex鄄
tensive cost analyses and recommendations for e鄄
quipment configurations to scan rare and oversized
materials.
BHL partners are also exploring ways to intro鄄
duce other essential content to the BHL portal. Col鄄
lectors忆 field notes, plant lists, and diaries often
hold important information that supplements content
found on specimen labels and published accounts.
Access to this primary source material is even more
problematic to scholars because most archival collec鄄
tions, if catalogued, are not described in very fine
detail. The United States National Herbarium and
Smithsonian Institution Archives have received a
Cataloging Hidden Special Collections and Archives
341 期摇 摇 摇 摇 Judith A. WARNEMENT: Botanical Literature Goes Global: The Biodiversity Heritage Library摇 摇 摇 摇 摇 摇 摇
Fig. 7摇 CiteBank homepage Fig. 8摇 BHL wiki
Fig. 9摇 Europeana portal Fig. 10摇 BHL鄄China portal
Grant, from the Council on Library and Information
Resources ( CLIR) to catalog all the field books,
unpublished journals, loose notes, and sketches that
document field research related to all disciplines of
biology. The grant, Exposing Biodiversity Field
Books and Original Expedition Journals at the Smith鄄
sonian Institution, will also will build a cataloging
tool to and create a central repository so that other
institutions can contribute their holdings. The en鄄
hanced level of description will improve access to
these important research materials that are frequently
difficult to discover and access remotely.
Several BHL partners have been awarded an
IMLS grant as a companion grant to the Smithsonian忆s
CLIR proposal. Connecting Content: A Collaboration
to Link Field Notes to Specimens and Published Liter鄄
ature will develop a system for integrating biological
researchers忆 field and specimen notes with museum
specimens and related electronically published litera鄄
ture. The enhanced and integrated access to biologi鄄
cal data will serve a wide variety of users, and will
connect to other ongoing projects such as the Biodi鄄
versity Heritage Library.
The Biodiversity Heritage Library will soon ben鄄
efit from another new collaboration. The Internation鄄
al Association of Plant Taxonomists (IAPT) has giv鄄
en their permission to rescan and integrate with the
BHL the monumental fifteen volume botanical bibli鄄
ography, Taxonomic Literature, 2nd ed. ( TL鄄2 ).
The Smithsonian Institution Library has been awar鄄
ded an Atherton Seidell Grant to accomplish the
scanning and design the schema. The BHL envisions
44摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 植 物 分 类 与 资 源 学 报摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 摇 第 33 卷
a dynamically linked “TL鄄3冶 that will connect cita鄄
tions to published references and allow for correc鄄
tions and the addition of new and expanded content.
The Biodiversity Heritage Library has achieved
remarkable success in its relatively short existence.
The partners have demonstrated that independent
and geographically dispersed institutions can collabo鄄
rate effectively, and have proven their ability to gen鄄
erate significant financial support. The technical ac鄄
complishments by a small team of talented and dedi鄄
cated informatics specialists and the efficient and
collegial intra鄄institutional working groups are appar鄄
ent in the array of tools and services currently deliv鄄
ered and under development via the various BHL in鄄
terfaces. On June 27, 2010, the American Library
Association忆s Association for Library Collections &
Technical Services ( ALCTS) awarded their Out鄄
standing Collaboration Citation to the BHL in recog鄄
nition of their outstanding collaborative partnership.
The project has generated excitement in the in鄄
ternational community and many opportunities to de鄄
velop new partnerships and sources of funding. Soci鄄
ety journal publishers are enthusiastic about partici鄄
pation in the BHL opt鄄in copyright model. The por鄄
tal has recorded nearly 1. 5 million visits since Janu鄄
ary of 2008, the taxonomic intelligence tool is highly
effective, and there are high levels of OCR accuracy
in late 19th and 20th century printing. However,
the Biodiversity Heritage Library faces many challen鄄
ges in the near future. Initial sources of funding end
in 2012, and a plan for financial and digital sustain鄄
ability must be formulated. The rapid international
expansion of BHL presents new governance issues,
increases the need for clear and focused standards,
and strategies to avoid duplication of effort. BHL is
working to ensure the technical infrastructure for de鄄
livering and preserving content through digitization
and retrospective ingestion, as well as the ability to
continue to deliver new services as needed by the
community.
Acknowledgements: The author wishes to acknowledge all
of the BHL partners for their collegiality and dedication, and
Dr. Jinshuang Ma and the Shanghai Chenshan Botanical Gar鄄
den and Plant Science Research Center for their support.
References:
Gwinn NE, Rinaldo C, 2009. The Biodiversity Heritage Library:
Sharing Biodiversity literature with the world [ J] . IFLA Jour鄄
nal, 35: 25—34摇 DOI: 10. 1177 / 0340035208102032
International Association of Aquatic and Marine Science Libraries and
Information Centers Conference (35th: 2009: Brugge, Belgium).
IAMSLIC Proceedings摇 http: / / hdl. handle. net / 1912 / 3787
Rinaldo C, 2009. The Biodiversity Heritage Library: exposing the
taxonomic literature [J] . Journal of Agricultural & Food Infor鄄
mation, 10: 259—265 摇 DOI: 10. 1080 / 10496500903014669
Rinaldo C, Norton C, 2010. The Biodiversity Heritage Library: an
expanding international collaboration
541 期摇 摇 摇 摇 Judith A. WARNEMENT: Botanical Literature Goes Global: The Biodiversity Heritage Library摇 摇 摇 摇 摇 摇 摇