ImageCLEFmed 2006 Medical Image Retrieval Protocol

Last updated - 27 December 2006, William Hersh

In ImageCLEFmed 2006, there were two medical image retrieval tasks:
This page focuses on the medical image retrieval task.  The task does not require the use of visual retrieval techniques, but an image retrieval system with such techniques (GIFT) is available for participants who do not have one and wish to use one.  More information about the image annotation task can be found at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef06/medicalaat.html.  Results can be found at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef06/medaat-results.html.

More information about the image collection and topics for ImageCLEFmed 2006 is available on the 2006 data page.

Overview of ImageCLEFmed 2006

Multilingual image retrieval

The multilingual image retrieval task is based on a dataset containing images from the Casimage, MIR, PEIR, and PathoPIC datasets (about 50,000 images, see below for more information). The collection contains annotations in XML format. The majority of the annotations are in English but a significant number are also in French and German, with a few cases that do not contain any annotation at all. The quality of the texts is variable between collections and even within the same collection.

Query topics were formulated with example images and a short textual description. The topics were categorized as being amenable to visual retrieval, text retrieval, or both.  The track organizers made available results from a state-of-the-art image retrieval system (medGIFT) and a state-of-the-art text engine (Lucene).

Automatic image annotation

Automatic image annotation or image classification can be an important step when searching for images from a database. In this task, a database of 9,000 fully classified images taken randomly from medical routine was made available and could be used to train a classification system. There were 1,000 images for which classification labels were not available to the participants that had to be classified. The aim was to find out how well current techniques could identify image modality, body orientation, body region, and biological system examined based on the images. The results of the classification step could be used for multilingual image annotations as well as for DICOM header corrections.  More information is available at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef06/medicalaat.html.

Image Library

The image library for the 2006 track was the same as the one used for ImageCLEFmed 2005 but updated with some data corrections of the 2005 collection.  It has the following collections:

Collection Name
Image Type(s)
Annotation Type(s)
Original URL
Casimage
Radiology and pathology
Clinical case descriptions
http://www.casimage.com/
Mallinckrodt Institute of Radiology (MIR)
Nuclear medicine
Clinical case descriptions http://gamma.wustl.edu/home.html
Pathology Education Instructional Resource (PEIR)
Pathology and radiology
Metadata records from HEAL database http://peir.path.uab.edu
PathoPIC
Pathology
Image description - long in German, short in English http://alf3.urz.unibas.ch/pathopic/e/intro.htm

A variety of corrections were made to the 2006 version of the image data, so please make sure you use the right data!  These changes included changing some language categorizations of Casimage data and some file extensions of the PEIR data.  (For the latter, a number of images used the .tif extension despite being JPEG images.

The sizes of the collections are as follows:

Collection Name
Cases
Images
Annotations
Annotations by Language
File Size (tar archive)
Casimage
2076
8725
2076
French - 1899
English - 177
1.28 GB
MIR
407
1177
407
English - 407
63.2 MB
PEIR
32319
32319
32319
English - 32319
2.50 GB
PathoPIC
7805
7805
15610
German - 7805
English 7805
879 MB

The image library is available per the instructions distributed to the track participants.  The files should be organized into the following directory structures:
+ ImageCLEFmed
+ CASImage
- Images
- XML
+ PathoPic
- Images
- XML
+ Peir
- Images
- XML
+ MIR
- Images
- XML
ImageCLEFmed.xml
The directory labeled ImageCLEFmed is the root directory. ImageCLEFmed.xml is the file that links the collections and their images and annotations. It has external links to the images and the associated annotations in XML files. It contains relative paths, from the root directory, to all the files.

The conceptual stucture of ImageCLEFmed.xml is as follows. The entire ImageCLEFmed library consists of multiple collections (e.g., Casimage, PEIR, MIR, PathoPIC). Each collection is organized into cases that represent a group of related images and annotations. Each case consists of a group of images and an optional annotation. Each image is part of a case and has optional associated annotations, which consist metadata (e.g., HEAL tagging), and/or a textual annotation. All of the images and annotations are stored in separate files.  ImageCLEFmed.xml only contains the connections between the collections, cases, images, and annotations.

Here is a graphical depiction of the library:

Library

Here is the XML structure of the ImageCLEFmed.xml file:
<library>
<collection>
<name>name-text</name>
<cases>
<case>
<id>identifier-text</id>
<images>
<image>
<id>identifier-text</id>
<imagefile>file-name-text</imagefile>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</image>
<images>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</case>
</cases>
</collection>
</library>
A sample version of the library with 10 cases from each of the four collections is available in a tar archive.  The XML library file itself is available at imageclefmed_small.xml.  A sample case from Casimage is shown below.

Example

The library; an XML file of the linkages across cases, images, and annotations across the library; queries and qrels from the 2004 and 2005 tracks; and assorted other files are available on the FTP server.  Instructions are provided to all groups officially registered in ImageCLEF.  No one else can have access to the data for now.  No exceptions!

The image retrieval system GIFT can be downloaded from http://www.gnu.org/software/gift/.

Topics

There were 30 topics for ImageCLEFmed 2006, with 10 topics in each of three categories:
The topics are available in three formats, all of which are in the protected area of the Web site that contains all of the ImageCLEFmed 2006 data:
The text of the German translation of two topics was corrected from the originally released topics. The versions on the Web site now reflect the corrected topics. In the text below, we describe how the topics for the 2006 task were classified into visual, mixed, and textual categories.

Visual

Topics are classified as visual in general when they have a single or few dominant visual properties that can be detected when analizing the query images. Results are relatively similar with respect to these visual properties. Dominant features can be colors (a red mouth topic 1.1), a texture (of text in topic 1.8 or the brain in 1.2) or the layout of the image often with a single object (knee as in 1.3, hand 1.6). In general visual system can be expected to detect the modality in combination with an anatomic region reasonably well.

Mixed

Mixed topics contain an important visual component but are more complex with respect to the expected result. They can be more variable with respect to the expected set of relevant documents than purely visual queries. Often, the modality and/or anatomic region can be detected but subtle and often local variations need to be taken into account (2.1 and 2.6 can have an ultrasound of different shape and the object can be in a variety of places, 2.2 requires experience to detect efatures purely visually). Often, for a non-specialist, it is important to read the text to decide on whether an image is relevant (2.8, 2.9 to decide on the cells). For topic 2.10 it might be hard to tell how important visual features are, as the results can have very large variation.

Semantic

Semantic topics will require the analysis of the text to obtain a relatively complete set of relevant images. This means that result images can be fairly different in appearance (see topic 3.5), the topic can appear in a varying size in the image (topic 3.1, 3.2) or the queries are very general (topic 3.10). The modality can be of limited importance in these queries (such as for 3.3). Sometimes the main suject (such as the cyst in topic 3.6) might be detectable by a specilaized system but the differences are subtle and the object can appear in many different anatomic regions. For many of these topics, a non-specialist needs to verify in the text to decide on the relevance of a given image.

Submitting runs

The format for submitting results was based on the trec_eval program as follows:
1 1 CASImage/Images/13156 1 0.567162 GE_gift
1 1 Peir/Images/00009331 2 0.441542 GE_gift
where:
Several key points for submitted runs were:

How can I register for ImageCLEF?

You can register for ImageCLEF on the CLEF web site.

Data and utilities available for the multilingual image retrieval task

The library; an XML file of the linkages across cases, images, and annotations across the library; queries and qrels from the 2004 and 2005 tracks; and assorted other files are available on the protected portion of the Web site.  Instructions are provided to all groups officially registered in ImageCLEF.  No one else can have access to the data for now.  No exceptions!

The image retrieval system GIFT can be downloaded from http://www.gnu.org/software/gift/.

Organisers

Henning Müller, University and Hospitals of Geneva, Switzerland Email:  henning.mueller@sim.hcuge.ch
Web:  http://www.sim.hcuge.ch/medgift/02_Henning_EN.htm
Paul Clough, Sheffield University, UK Email:  p.d.clough@sheffield.ac.uk
Web:  http://ir.shef.ac.uk/cloughie/
William Hersh, Oregon Health &Science University (OHSU), Portland, OR, USA Email:  hersh@ohsu.edu
Web:  http://www.billhersh.info/