ImageCLEFmed 2006 Medical Image Retrieval Protocol
Last updated - 27
December
2006, William Hersh
In ImageCLEFmed 2006, there were two medical image retrieval
tasks:
- Image retrieval - retrieval of relevant images from real-world
topics
- Automatic image
annotation - automated assignment of image attributes
This page focuses on the medical image retrieval task. The task
does not require the use of visual retrieval techniques, but an image
retrieval system with such techniques (GIFT) is available
for participants who do not have one and wish to use one.
More
information about the image annotation task can be found at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef06/medicalaat.html.
Results can be found at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef06/medaat-results.html.
More information about the image collection and topics for ImageCLEFmed
2006 is available on the 2006 data page.
Overview of ImageCLEFmed
2006
Multilingual image retrieval
The multilingual image retrieval task is based on a dataset
containing images from the Casimage, MIR, PEIR, and
PathoPIC
datasets (about 50,000 images, see below for more information). The
collection contains annotations in XML
format. The majority of the annotations are in English but a
significant number are also in French and German, with a few cases that
do not
contain any annotation at all. The quality of the texts is variable
between
collections and even within the same collection.
Query topics were formulated with example images and a short
textual
description. The topics were categorized as being amenable to visual
retrieval, text retrieval, or both. The track
organizers made available results from a state-of-the-art
image retrieval system (medGIFT) and a state-of-the-art text engine
(Lucene).
Automatic image annotation
Automatic image annotation or image classification can be an important
step when searching for images from a database. In this task, a
database of 9,000 fully classified images taken randomly from medical
routine was made available and could be used to train a classification
system. There were 1,000 images for which classification labels were
not available
to the participants that had to be classified. The aim was to find out
how
well current techniques could identify image modality, body
orientation,
body region, and biological system examined based on the images. The
results of the classification step could be used for multilingual image
annotations as well as for DICOM header corrections. More
information is available at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef06/medicalaat.html.
Image Library
The image library for the 2006 track was the same as the one
used for
ImageCLEFmed 2005 but updated
with some data corrections of the 2005 collection. It has
the following
collections:
A variety of corrections were made to the 2006 version of the
image data, so please make sure you use the right data! These
changes included changing some language categorizations of Casimage
data
and some file extensions of the PEIR data. (For the latter, a
number of images used the .tif extension despite being JPEG images.
The sizes of the collections are as follows:
Collection
Name
|
Cases
|
Images
|
Annotations
|
Annotations
by Language
|
File
Size (tar archive)
|
Casimage
|
2076
|
8725
|
2076
|
French - 1899
English - 177
|
1.28 GB
|
MIR
|
407
|
1177
|
407
|
English - 407
|
63.2 MB
|
PEIR
|
32319
|
32319
|
32319
|
English - 32319
|
2.50 GB
|
PathoPIC
|
7805
|
7805
|
15610
|
German - 7805
English 7805
|
879 MB
|
The image library is available per the instructions distributed to the
track participants. The files should be organized into the
following directory structures:
+ ImageCLEFmed
+ CASImage
- Images
- XML
+ PathoPic
- Images
- XML
+ Peir
- Images
- XML
+ MIR
- Images
- XML
ImageCLEFmed.xml
The directory labeled ImageCLEFmed is the root directory.
ImageCLEFmed.xml is the file that links the collections and their
images and annotations. It has external links to the images and
the associated annotations in XML files. It contains relative
paths, from the root directory, to all the files.
The conceptual stucture of ImageCLEFmed.xml is as follows. The
entire ImageCLEFmed library consists
of multiple collections (e.g., Casimage, PEIR, MIR, PathoPIC). Each collection is organized
into cases that represent a group of related images and
annotations. Each case consists
of a group of images and an optional annotation. Each image is part of a case and has
optional associated annotations, which consist metadata (e.g., HEAL
tagging), and/or a textual annotation. All of the images and
annotations are stored in separate files. ImageCLEFmed.xml only
contains the connections between the collections, cases, images, and
annotations.
Here is a graphical depiction of the library:

Here is the XML structure of the ImageCLEFmed.xml file:
<library>
<collection>
<name>name-text</name>
<cases>
<case>
<id>identifier-text</id>
<images>
<image>
<id>identifier-text</id>
<imagefile>file-name-text</imagefile>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</image>
<images>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</case>
</cases>
</collection>
</library>
A sample version of the library with 10 cases from each of the four
collections is available in a tar
archive. The XML library file itself is available
at imageclefmed_small.xml.
A sample case from Casimage is shown below.

The library; an XML file of the linkages across cases, images, and
annotations across the library; queries and qrels from the 2004 and
2005 tracks;
and assorted other files are available on the FTP server.
Instructions are provided to all groups officially registered in
ImageCLEF. No one else can have access to the data for now.
No exceptions!
The image retrieval system GIFT can be downloaded from
http://www.gnu.org/software/gift/.
Topics
There were 30 topics for ImageCLEFmed
2006, with 10 topics in each of three categories:
- Visual - topics where visual system alone can reach good
performance
- Mixed - topics with a visual influence but where text can improve
results strongly
- Textual - topics where the visual influence is limited as the
variety with respect to visualness in the results is large
The topics are available in three formats, all of which are in the
protected area of the Web site that contains all of the ImageCLEFmed
2006 data:
- A Word file that contains the text of the topic in three
languages and the images pasted in - iclefmed2006topics.doc
- An ASCII text file (Windows format) with only the text of the
topics - iclefmed2006topics.txt
- A WinZip file of the images only - iclefmed2006topics.zip
The text of the German translation of two topics was corrected from the
originally released topics. The versions on the Web site now reflect
the corrected topics. In the text below, we describe how the topics for
the 2006 task were
classified into visual, mixed, and textual categories.
Visual
Topics are classified as visual in general when they have a single or
few dominant visual properties that can be detected when analizing the
query images. Results are relatively similar with respect to these
visual properties. Dominant features can be colors (a red mouth topic
1.1), a texture (of text in topic 1.8 or the brain in 1.2) or the
layout of the image often with a single object (knee as in 1.3, hand
1.6). In general visual system can be expected to detect the modality
in combination with an anatomic region reasonably well.
Mixed
Mixed topics contain an important visual component but are more complex
with respect to the expected result. They can be more variable with
respect to the expected set of relevant documents than purely visual
queries. Often, the modality and/or anatomic region can be detected but
subtle and often local variations need to be taken into account (2.1
and 2.6 can have an ultrasound of different shape and the object can be
in a variety of places, 2.2 requires experience to detect efatures
purely visually). Often, for a non-specialist, it is important to read
the text to decide on whether an image is relevant (2.8, 2.9 to decide
on the cells). For topic 2.10 it might be hard to tell how important
visual features are, as the results can have very large variation.
Semantic
Semantic topics will require the analysis of the text to obtain a
relatively complete set of relevant images. This means that result
images can be fairly different in appearance (see topic 3.5), the topic
can appear in a varying size in the image (topic 3.1, 3.2) or the
queries are very general (topic 3.10). The modality can be of limited
importance in these queries (such as for 3.3). Sometimes the main
suject (such as the cyst in topic 3.6) might be detectable by a
specilaized system but the differences are subtle and the object can
appear in many different anatomic regions. For many of these topics, a
non-specialist needs to verify in the text to decide on the relevance
of a given image.
Submitting runs
The format for submitting results was based on the trec_eval program as
follows:
1 1 CASImage/Images/13156 1 0.567162 GE_gift
1 1 Peir/Images/00009331 2 0.441542 GE_gift
where:
- The first column contains the topic number, in our case from 1-30
and
not in the format 1.1, 1.2 etc
- The second column is always 1
- The third column is the image identifier, which includes the
database
and the image number as shown but not the extension of the image file
- The fourth column is the the ranking for the topic
- The fifth column is the score assigned by the system
- The sixth column is the identifier for the run and should be the
same
in the entire file
Several
key points for submitted runs were:
- The topic numbers should be consecutive from 1-30 and not the
format in the topic files (1.1, 1.2, etc.). A conversion file is available for
proper number of topics.
- The image identifiers should include the database "path" (e.g.,
Peir/Images/00009331) but not
the file extension for the file (e.g., .jpg).
- Each run must be submitted in a single file. Files should
be pure text files and not
be zipped or otherwise compressed.
How can I register for ImageCLEF?
You can register for ImageCLEF on the CLEF web site.
Data and utilities available for the multilingual image
retrieval task
The library; an XML file of the linkages across cases, images, and
annotations across the library; queries and qrels from the 2004 and
2005 tracks;
and assorted other files are available on the protected portion of the
Web site.
Instructions are provided to all groups officially registered in
ImageCLEF. No one else can have access to the data for now.
No exceptions!
The image retrieval system GIFT can be downloaded from
http://www.gnu.org/software/gift/.
Organisers