ImageCLEFmed 2007 Medical Image Retrieval Protocol

Last updated - 8 Octboer 2007, William Hersh

ImageCLEF 2007 has ended. The final results are available, and it should be noted that the original qrels that were distributed in July were found to have a number of incorrect judgments. As a result, we have had 7 topics re-judged. The new qrels are available, along with new results for the official runs. We apologize for any inconvenience from this.

Another problem we found in our analysis of the data was that different runs used different cases for their outputs. As trec_eval is case-sensitive, this impacted some groups' results. For that reason, we have lower-cased the qrels as well as all official runs. Any additional runs that groups perform should keep this in mind. The lower-cased runs and the output from trec_eval for them are in the protected area of the track Web site, for which all participants have a login and password.

Data Issues

We identified a few empty cases in the new myPACS data that we were not able to fix before the submission deadline. There were also some completely black images in the collection that we were unable to remove. There were also some minor corrections to the data files, which are also documented on the 2007 data page:
In ImageCLEFmed 2007, as in previous years, there were two medical image retrieval tasks:
This page focuses on the medical image retrieval task.  The task does not require the use of visual retrieval techniques, but an image retrieval system with such techniques (GIFT) is available for participants who do not have one and wish to use one.  More information about the image annotation task can be found at http://www-i6.informatik.rwth-aachen.de/~deselaers/imageclef07/medaat.html.

More information about the image collection and topics for ImageCLEFmed 2007 is available on the 2007 data page. Note that usage of mypacs.net data requires completing a separate license agreement and sending it to Henning Müller.

Overview of ImageCLEFmed 2007

Multilingual image retrieval

The multilingual image retrieval task is based on a dataset containing images from the Casimage, MIR, PEIR, and PathoPIC datasets (about 50,000 images, see below for more information). The collection contains annotations in XML format. The majority of the annotations are in English but a significant number are also in French and German, with a few cases that do not contain any annotation at all. The quality of the texts is variable between collections and even within the same collection.

Query topics were formulated with example images and a short textual description. The topics were categorized as being amenable to visual retrieval, text retrieval, or both.  The track organizers made available results from a state-of-the-art image retrieval system (medGIFT) and a state-of-the-art text engine (Lucene).

Automatic image annotation

Automatic image annotation or image classification can be an important step when searching for images from a database. In this task, a database of 9,000 fully classified images taken randomly from medical routine was made available and could be used to train a classification system. There were 1,000 images for which classification labels were not available to the participants that had to be classified. The aim was to find out how well current techniques could identify image modality, body orientation, body region, and biological system examined based on the images. The results of the classification step could be used for multilingual image annotations as well as for DICOM header corrections.

Instructions for participating in the 2007 task

Those who have particpiated in previous offerings of the track should be familiiar with this process. There are a few new steps this year.

In order to obtain the data (described below), you must complete a data usage agreement. There is one additional form you must complete related to the use of the MyPacs.net data. Once we have all of the completed forms, we will send you instructions with login and password for obtaining the data. At a minimum, you will need a text retrieval or image retrieval system to participate. The task does not require the use of visual retrieval techniques, but an image retrieval system with such techniques is available (GIFT).

The topics will be released around the beginning of May, and runs are due (instructions for submission to appear later) at the beginning of June. You must submit results in order to attend the CLEF Workshop, which takes place September 20-22 in Budapest, Hungary.

Image Library

The image library for the 2007 track will be an enlarged version of the one used for ImageCLEFmed 2005-2006.  It has the following collections:

Collection Name
Image Type(s)
Annotation Type(s)
Original URL
Casimage
Radiology and pathology
Clinical case descriptions
http://www.casimage.com/
Mallinckrodt Institute of Radiology (MIR)
Nuclear medicine
Clinical case descriptions http://gamma.wustl.edu/home.html
Pathology Education Instructional Resource (PEIR)
Pathology and radiology
Metadata records from HEAL database http://peir.path.uab.edu/
PathoPIC
Pathology
Image description - long in German, short in English http://alf3.urz.unibas.ch/pathopic/e/intro.htm
MyPACS
Radiology
Clinical case descriptions http://www.mypacs.net/
Clinical Outcomes Research Initiative (CORI) Endoscopic Images
Endoscopy
Clinical case descriptions http://www.cori.org/

The sizes of the collections are as follows:

Collection Name
Cases
Images
Annotations
Annotations by Language
File Size (tar archive)
Casimage
2076
8725
2076
French - 1899
English - 177
1.28 GB
MIR
407
1177
407
English - 407
63.2 MB
PEIR
32319
32319
32319
English - 32319
2.50 GB
PathoPIC
7805
7805
15610
German - 7805
English - 7805
879 MB
myPACS
3577
15140
3577
English - 3577
390 MB
Endoscopic
1496
1496
1496
English - 1496
34 MB

The image library is available per the instructions distributed to the track participants.  The files should be organized into the following directory structures:
+ ImageCLEFmed
+ CASImage
- Images
- XML
+ PathoPic
- Images
- XML
+ Peir
- Images
- XML
+ MIR
- Images
- XML
+ MyPACS
- Images
- XML
+ CORI
- Images
- XML
ImageCLEFmed2007.xml
The directory labeled ImageCLEFmed is the root directory. ImageCLEFmed2007.xml is the file that links the collections and their images and annotations. It has external links to the images and the associated annotations in XML files. It contains relative paths, from the root directory, to all the files.

The conceptual stucture of ImageCLEFmed.xml is as follows. The entire ImageCLEFmed library consists of multiple collections (e.g., Casimage, PEIR, MIR, PathoPIC, MyPacs, CORI). Each collection is organized into cases that represent a group of related images and annotations. Each case consists of a group of images and an optional annotation. Each image is part of a case and has optional associated annotations, which consist metadata (e.g., HEAL tagging), and/or a textual annotation. All of the images and annotations are stored in separate files.  ImageCLEFmed2007.xml only contains the connections between the collections, cases, images, and annotations.

Here is a graphical depiction of the library:

Library

Here is the XML structure of the ImageCLEFmed.xml file:
<library>
<collection>
<name>name-text</name>
<cases>
<case>
<id>identifier-text</id>
<images>
<image>
<id>identifier-text</id>
<imagefile>file-name-text</imagefile>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</image>
<images>
<annotation lang=” “>file-name-text</annotation>
<annotation lang=” “>file-name-text</annotation>
</case>
</cases>
</collection>
</library>
A sample version of the library with 10 cases from each of the four collections is available in a tar archive.  The XML library file itself is available at imageclefmed_small.xml.  A sample case from Casimage is shown below.

Example

The library; an XML file of the linkages across cases, images, and annotations across the library; queries and qrels from the 2004-2006 tracks; and assorted other files are available on the FTP server.  Instructions are provided to all groups officially registered in ImageCLEF. No one else can have access to the data for now. No exceptions!

The image retrieval system GIFT can be downloaded from http://www.gnu.org/software/gift/.

Topics

There were 30 topics for ImageCLEFmed 2007, with 10 topics in each of three categories:
The topics were available in three formats, all of which are in the protected area of the Web site that contains all of the ImageCLEFmed 2007 data. They are named as follows:

Submitting runs

The format for submitting results will be based on the trec_eval program as follows:
1 1 CASImage/Images/13156 1 0.567162 GE_gift
1 1 Peir/Images/00009331 2 0.441542 GE_gift
where:
Several key points for submitted runs are:

Data and utilities available for the multilingual image retrieval task

The library; an XML file of the linkages across cases, images, and annotations across the library; queries and qrels from the 2004 and 2005 tracks; and assorted other files are available on the protected portion of the Web site.  Instructions are provided to all groups officially registered in ImageCLEF.  No one else can have access to the data for now.  No exceptions!

The image retrieval system GIFT can be downloaded from http://www.gnu.org/software/gift/.

Organisers

Henning Müller, University and Hospitals of Geneva, Geneva, Switzerland Email:  henning.mueller@sim.hcuge.ch
Web:  http://www.sim.hcuge.ch/medgift/02_Henning_EN.htm
Paul Clough, Sheffield University, Sheffield, England
Email:  p.d.clough@sheffield.ac.uk
Web:  http://ir.shef.ac.uk/cloughie/
William Hersh, Oregon Health &Science University (OHSU), Portland, OR, USA Email:  hersh@ohsu.edu
Web:  http://www.billhersh.info/
Jayashree Kalpathy-Cramer, Oregon Health &Science University (OHSU), Portland, OR, USA Email: kalpathy@ohsu.edu