ImageCLEFmed 2006 Medical Image Retrieval Results

Last updated - 27 December 2006, William Hersh

A total of 10 groups participated in ImageCLEFmed 2006 from eight different countries (Canada, China, France, Germany, Singapore, Spain, Switzerland, and the United States). These  groups collectively submitted 100 runs, with each group submitting anywhere from 1 to 26.

For relevance judging, pools were built from all images for a given topic ranked in the top 30 retrieved. This gave pools of anywhere from 647 to 1187 images, with a mean of 910 per topic. Relevance judgments were performed by seven US physicians enrolled in or graduates of the OHSU biomedical informatics graduate program. Eleven of the 30 topics were judged in duplicate, with two of those topics judged by three different judges. Each topic had a designated "original" judge from the seven. From these judgments, a qrels file was developed for using trec_eval. The qrels file, which can be used for further experiments, has been posted in the protected section of this site. The name of the file is icm06.qrels.txt.

A total of 27,306 relevance judgments were made. (These were primary judgments; ten topics had duplicate judgments that we will analyze later.) There were 62 images that were not judged, a very tiny fraction that we decided to ignore (i.e., as if the image were not in the pool). The judgments were turn fashioned into a qrels file, which was then used to calculate results with trec_eval. A variety of results are shown in the accompanying spreadsheet. The table below lists the worksheets of the spreadsheet and their contents.

Worksheet Contents
Duplicates Runs with duplicate images for topics, with number deleted from run listed
Judgments Summary of relevance judgments
All Results Results of all runs from trec_eval
Summary Results Summary of results with run categories added
Summary Sorted Summary results sorted by MAP
Run Type Results Summary results sorted by topic-system and then MAP
Best Only Best results for each group within a given category (will be used for statistical analysis and other aggregation purposes)
Topic Results Results by topic

More analysis of this data will appear after the CLEF Workshop.