ImageCLEFmed 2008 Medical Image Retrieval Protocol
Last updated - 27
May 2008, William Hersh
This page
contains the details for the ImageCLEF 2008
medical image retrieval task. Thanks to the efforts of Dr. Charles
Kahn of the Medical College of Wisconsin, we have been able to
procure a new image collection for 2008. This collection consists of
images from two radiology journals,
along with their captions, article titles, and linkage to Pubmed and the
full text. We have been given permission by the Radiology Society of North America (RSNA)
to use all of their images in the Goldminer
system from their two journals, Radiology and Radiographics, as
mounted by Highwire Press.
The new collection has a total of JPEG 67,115 images.
In addition to the images, we are also distributing an XML file with
the
metadata for these images, which consists of the following fields:
- figureID - Goldminer image identifier
- figureURL - URL of page that has low-resolution of image,
caption, and link to high-resolution version
- caption - text of figure caption
- title - title of article
- pmid - Pubmed ID or article (to obtain MEDLINE record, including
MeSH terms)
- articleURL - URL of full-text of article
- imageURL - URL of the large version of the image (the image
contained in the database)
- imageLocalName - the name of the image, both in the images
directory and on the journal website
Here is an example of the metadata:
<Record>
<figureID>27979</figureID>
<figureURL>http://radiology.rsnajnls.org/cgi/content/full/210/1/11/F1</figureURL>
<caption>"Illustration of a neonate at autopsy whose demise was attributed to thymic death. The caption drew attention to the enormous size of the thymus, which is actually normal in appearance. (Reprinted, with permission, from reference 6.)"</caption>
<title>The right place at the wrong time: historical perspective of the relation of the thymus gland and pediatric radiology</title>
<pmid>9885579</pmid>
<articleURL>http://radiology.rsnajnls.org/cgi/content/full/210/1/11</articleURL>
<imageURL>http://radiology.rsnajnls.org/content/vol210/issue1/images/large/r99ja45g1x.jpeg</imageURL>
<imageLocalName>r99ja45g1x.jpeg</imageLocalName>
</Record>
The images are now available for distribution to research
groups. Users registered for CLEF should have the password that enables
them to access the images at:
http://ir.ohsu.edu/iclefmed08/
Images
The 67,115 images are provided at three levels of resolution:
- Full size - available for download as a single very large (15G)
file, highResImages.tar.gz, or as eight 2G files (see README for
details)
- 512 KB - available for download as a single 2.6G file,
resizeImages512.tar.gz
- 256 KB - available for download as a single 868M file,
resizeImages256.tar.gz
There are three files of figure captions:
- ImageCLEF2008.xml - 52M file of all metadata
- texts.tar.gz - individual files of captions, 15M in size
- rawCaptions_dist.csv - expanded text for some captions, 41M in
size
Topics
There are 30 topics for ImageCLEFmed
2008, derived from the topics of 2005-2007, with 10 topics in each of
three categories:
- Visual - topics where visual system alone can reach good
performance
- Mixed - topics with a visual influence but where text can improve
results strongly
- Textual - topics where the visual influence is limited as the
variety with respect to visualness in the results is large
The topic files are available in several formats:
- Text-only in a single XML file (8.2K), RetrievalTopics2008.xml
- With index images pasted into a Word file (2.8M),
Iclefmed2008_topics.doc
- All of the images in a single .zip arhive (4M),
iclefmed2008_images.zip
- Mapping of topic numbers between consolidated (2005-2007)
collection and this collection (37K), mappingOldNewTopics.xls
Submitting runs
The format for submitting results will be based on the trec_eval
program as
follows:
1 1 3156 1 0.567162 GE_gift
1 1 9331 2 0.441542 GE_gift
where:
- The first column contains the topic number, in our case from 1-30
and
not in the format 1.1, 1.2 etc
- The second column is always 1
- The third column is the image identifier, which includes the
database
and the image number as shown but not the extension of the image file
- The fourth column is the the ranking for the topic
- The fifth column is the score assigned by the system
- The sixth column is the identifier for the run and should be the
same
in the entire file
Several
key points for submitted runs are:
- The image identifiers should not
include the file extension for the file (e.g., .jpg).
- Each run must be submitted in a single file. Files should
be pure text files and not
be zipped or otherwise compressed.
- It is essential that the assigned score be in descending order by
image rank for trec_eval to work properly.
Organisers
| Henning Müller, University and Hospitals of Geneva,
Geneva, Switzerland |
Email: henning.mueller@sim.hcuge.ch
Web: http://www.sim.hcuge.ch/medgift/02_Henning_EN.htm
|
| William Hersh, Oregon Health &Science University (OHSU),
Portland, OR, USA |
Email: hersh@ohsu.edu
Web: http://www.billhersh.info/ |
| Jayashree Kalpathy-Cramer,
Oregon Health &Science University (OHSU),
Portland, OR, USA |
Email: kalpathy@ohsu.edu
|