ImageCLEFmed 2005 Medical Image Retrieval Protocol

Last updated - 27 December 2006, William Hersh

ImageCLEFmed is part of the Cross Language Evaluation Forum (CLEF, www.clef-campaign.org), a benchmarking event for multilingual information retrieval held annually since 2000. CLEF first began as a track in the Text Retrieval Conference (TREC, trec.nist.gov). A multilingual image retrieval track started in 2003, and in 2004 a medical image retrieval track was added.  More information on results and participation can be found on the main ImageCLEF pages.

Image or multimedia retrieval is interesting for the domain of cross-language information retrieval as the media such as images are inherently almost insensitive to language. The main goal of ImageCLEFmed is to improve the retrieval of medical images from heterogeneous and multilingual document collections containing images as well as text. Many collections exist on the World Wide Web that contain images as well as multilingual texts. However, the retrieval of images is an often-neglected topic in the information retrieval domain. In particular, hospitals produce an enormous amount of visual data but tools to manage these images and videos are scarce and exist currently only as research prototypes.

Through the creation of standard databases as well as query tasks and relevance judgments for evaluation, we aim to create a standard environment for evaluation and a platform for research groups to meet and compare the performance of various techniques based on the same data. Our goal is to expand the test collection every year, if possible, and to create realistic and challenging tasks.

Retrieval tasks

In ImageCLEFmed 2005, there were two medical image retrieval tasks. Both tasks required the use of image retrieval techniques for best results. The automatic image annotation task did not contain any text as input for the task and is aimed at image analysis research groups. An image retrieval system (GIFT) was available for the participants who do not have access to one themselves.

Multilingual retrieval from a large document collection (teaching files)

The multilingual image retrieval task was based on a larger dataset than in 2004, containing images from the Casimage, MIR, PEIR, and PathoPIC datasets (about 50,000 images, see below for more information). The collection contains annotations in XML format. The majority of the annotations are in English but a significant number is also in French and German, with a few cases that do not contain any annotation at all. The quality of the texts is variable between collections and even within the same collection. Query tasks were formulated with example images and a short textual description explaining the research goal.

Automatic image annotation

Automatic image annotation or image classification can be an important step when searching for images from a database. In this task, a database of 9,000 fully classified images taken randomly from medical routine was made available and could be used to train a classification system. There were 1,000 images for which classification labels were not available to the participants that had to be classified. The aim was to find out how well current techniques could identify image modality, body orientation, body region, and biological system examined based on the images. The results of the classification step could be used for multilingual image annotations as well as for DICOM header corrections.  More information is available at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef05annotation.html.

Although only simple class numbers were provided for ImageCLEFmed 2005, the images were annotated with the complete IRMA code, a multi-axial code for image annotation. The code was available in English and German.

Databases and utilities available for the tasks

The image library for 2005 and 2006 has the following collections:

Collection Name
Image Type(s)
Annotation Type(s)
Original URL
Casimage
Radiology and pathology
Clinical case descriptions
http://www.casimage.com/
Mallinckrodt Institute of Radiology (MIR)
Nuclear medicine
Clinical case descriptions http://gamma.wustl.edu/home.html
Pathology Education Instructional Resource (PEIR)
Pathology and radiology
Metadata records from HEAL database http://peir.path.uab.edu
PathoPIC
Pathology
Image description - long in German, short in English http://alf3.urz.unibas.ch/pathopic/e/intro.htm

The sizes of the collections are as follows:

Collection Name
Cases
Images
Annotations
Annotations by Language
File Size (tar archive)
Casimage
2076
8725
2076
French - 1899
English - 177
1.28 GB
MIR
407
1177
407
English - 407
63.2 MB
PEIR
32319
32319
32319
English - 32319
2.50 GB
PathoPIC
7805
7805
15610
German - 7805
English 7805
879 MB

The image library is available per the instructions distributed to the track participants. The files should be organized into the following directory structures:
+ ImageCLEFmed
+ CASImage
- Images
- XML
+ PathoPic
- Images
- XML
+ Peir
- Images
- XML
+ MIR
- Images
- XML
ImageCLEFmed.xml
The directory labeled ImageCLEFmed is the root directory. ImageCLEFmed.xml is the file that links the collections and their images and annotations. It has external links to the images and the associated annotations in XML files. It contains relative paths, from the root directory, to all the files.

The conceptual stucture of ImageCLEFmed.xml is as follows. The entire ImageCLEFmed library consists of multiple collections (e.g., Casimage, PEIR, MIR, PathoPIC). Each collection is organized into cases that represent a group of related images and annotations. Each case consists of a group of images and an optional annotation. Each image is part of a case and has optional associated annotations, which consist metadata (e.g., HEAL tagging), and/or a textual annotation. All of the images and annotations are stored in separate files. ImageCLEFmed.xml only contains the connections between the collections, cases, images, and annotations. Here is a graphical depiction of the library:

Library

Here is the XML structure of the ImageCLEFmed.xml file:
<library>
<collection>
<name>name-text</name>
<cases>
<case>
<id>identifier-text</id>
<images>
<image>
<id>identifier-text</id>
<imagefile>file-name-text</imagefile>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</image>
<images>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</case>
</cases>
</collection>
</library>
A sample version of the library with 10 cases from each of the four collections is available in a tar archive.  The XML library file itself is available at imageclefmed_small.xml.  A sample case from Casimage is shown below.

Example

The image retrieval system GIFT can be downloaded from http://www.gnu.org/software/gift/.

How can I register for ImageCLEF?

You can register for ImageCLEF by contacting Carol Peters (carol.peters@isti.cnr.it), the main coordinator for CLEF. For more specific information about any aspect of ImageCLEF or the tasks, please contact Paul Clough. For questions that concern only the medical task please contact Henning Müller.

Once you have registered your interest with Carol Peters and filled in the appropriate copyright and declaration forms you will be able to download the data collections. The data can be downloaded via our ftp server, but if you experience any difficulties or you would rather access the collection via another method (e.g., DVD) then please contact Henning Müller.

Organisers

Henning Müller, University and Hospitals of Geneva, Switzerland Email:  henning.mueller@sim.hcuge.ch
Web:  http://www.sim.hcuge.ch/medgift/
Paul Clough, Sheffield University, UK Email:  p.d.clough@sheffield.ac.uk
Web:  http://ir.shef.ac.uk/cloughie/
William Hersh, Oregon Health &Science University (OHSU), Portland, OR, USA Email:  hersh@ohsu.edu
Web:  http://www.billhersh.info/