ImageCLEFmed 2005 Medical
Image Retrieval Protocol
Last updated - 27
December 2006, William Hersh
ImageCLEFmed is part of the Cross Language Evaluation Forum (CLEF, www.clef-campaign.org), a
benchmarking event for multilingual information retrieval held annually
since 2000. CLEF first began as a track in the Text Retrieval
Conference (TREC, trec.nist.gov). A
multilingual image retrieval track started in 2003, and in 2004 a medical image
retrieval track was added. More information on results and
participation can be found on the main ImageCLEF pages.
Image or multimedia retrieval is interesting for the domain of
cross-language information retrieval as the media such as images are
inherently almost insensitive to language. The main goal of
ImageCLEFmed is to improve the retrieval of medical images from
heterogeneous and multilingual document collections containing images
as well as text. Many collections exist on the World Wide Web
that contain images as well as multilingual texts. However, the
retrieval of images is an often-neglected topic in the information
retrieval domain. In particular, hospitals produce an enormous amount
of visual data but tools to manage these images and videos are scarce
and exist currently only as research prototypes.
Through the creation of standard databases as well as query tasks and
relevance judgments for evaluation, we aim to create a standard
environment for evaluation and a platform for research groups to meet
and compare the performance of various techniques based on the same
data. Our goal is to expand the test collection every year, if
possible, and to create realistic and challenging tasks.
Retrieval tasks
In ImageCLEFmed 2005, there were two medical image retrieval
tasks. Both tasks required the use of image
retrieval techniques for best results. The automatic image annotation
task did not contain any text as input for the task and is aimed at
image analysis research groups. An image retrieval system (GIFT) was
available for the participants who do not have access to one themselves.
Multilingual retrieval from a large document collection (teaching
files)
The multilingual image retrieval task was based on a larger dataset
than in 2004, containing images from the Casimage, MIR, PEIR, and
PathoPIC
datasets (about 50,000 images, see below for more information). The
collection contains annotations in XML
format. The majority of the annotations are in English but a
significant number is also in French and German, with a few cases that
do not
contain any annotation at all. The quality of the texts is variable
between
collections and even within the same collection. Query tasks were
formulated with example images and a short
textual
description explaining the research goal.
Automatic image annotation
Automatic image annotation or image classification can be an important
step when searching for images from a database. In this task, a
database of 9,000 fully classified images taken randomly from medical
routine was made available and could be used to train a classification
system. There were 1,000 images for which classification labels were
not available
to the participants that had to be classified. The aim was to find out
how
well current techniques could identify image modality, body
orientation,
body region, and biological system examined based on the images. The
results of the classification step could be used for multilingual image
annotations as well as for DICOM header corrections. More
information is available at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef05annotation.html.
Although only simple class numbers were provided for ImageCLEFmed
2005, the images were annotated with the complete IRMA code, a
multi-axial
code for image annotation. The code was available in English
and German.
Databases and utilities available for the tasks
The image library for 2005 and 2006 has the following collections:
The sizes of the collections are as follows:
Collection
Name
|
Cases
|
Images
|
Annotations
|
Annotations
by Language
|
File
Size (tar archive)
|
Casimage
|
2076
|
8725
|
2076
|
French - 1899
English - 177
|
1.28 GB
|
MIR
|
407
|
1177
|
407
|
English - 407
|
63.2 MB
|
PEIR
|
32319
|
32319
|
32319
|
English - 32319
|
2.50 GB
|
PathoPIC
|
7805
|
7805
|
15610
|
German - 7805
English 7805
|
879 MB
|
The image library is available per the instructions distributed to the
track participants. The files should be organized into the
following directory structures:
+ ImageCLEFmed
+ CASImage
- Images
- XML
+ PathoPic
- Images
- XML
+ Peir
- Images
- XML
+ MIR
- Images
- XML
ImageCLEFmed.xml
The directory labeled ImageCLEFmed is the root directory.
ImageCLEFmed.xml is the file that links the collections and their
images and annotations. It has external links to the images and
the associated annotations in XML files. It contains relative
paths, from the root directory, to all the files.
The conceptual stucture of ImageCLEFmed.xml is as follows. The
entire ImageCLEFmed library consists
of multiple collections (e.g., Casimage, PEIR, MIR, PathoPIC). Each collection is organized
into cases that represent a group of related images and
annotations. Each case consists
of a group of images and an optional annotation. Each image is part of a case and has
optional associated annotations, which consist metadata (e.g., HEAL
tagging), and/or a textual annotation. All of the images and
annotations are stored in separate files. ImageCLEFmed.xml only
contains the connections between the collections, cases, images, and
annotations. Here is a graphical depiction of the library:

Here is the XML structure of the ImageCLEFmed.xml file:
<library>
<collection>
<name>name-text</name>
<cases>
<case>
<id>identifier-text</id>
<images>
<image>
<id>identifier-text</id>
<imagefile>file-name-text</imagefile>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</image>
<images>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</case>
</cases>
</collection>
</library>
A sample version of the library with 10 cases from each of the four
collections is available in a tar
archive. The XML library file itself is available
at imageclefmed_small.xml.
A sample case from Casimage is shown below.

The image retrieval system GIFT can be downloaded from
http://www.gnu.org/software/gift/.
How can I register for ImageCLEF?
You can register for ImageCLEF by contacting Carol Peters
(carol.peters@isti.cnr.it),
the main coordinator for CLEF. For more
specific information about any aspect of ImageCLEF or the tasks, please
contact Paul Clough. For questions that concern only the medical task
please contact Henning Müller.
Once you have registered your interest with Carol Peters and filled in
the appropriate copyright and declaration forms you will be able to
download the data collections. The data can be downloaded via our ftp
server, but if you experience any difficulties or you would rather
access the collection via another method (e.g., DVD) then please
contact Henning Müller.
Organisers