Hasn't this problem been solved?
Why do you want to read text in images on the chance that it's spam? Shouldn't you just assume that unsolicited images in e-mails without appropriate text are spam?
From [link|http://aspn.activestate.com/ASPN/Mail/Message/news-announce/1525459|2003]:
VANCOUVER, BC -- ActiveState Corp., the leader in enterprise email filtering software, has released new PureMessage technology to reliably catch a new and dangerous form of spam -- image spam. Increasing in frequency by over 25% since November, image spam isn't only a nuisance, it is a threat to every email box's security. Using PureMessage, organizations are assured protection against productivity loss, network downtime and vulnerability of informational assets associated with unsolicited or malicious email.
Image spam is an unsolicited commercial email that presents its message to individuals through visual images. This is accomplished by creating links within the body of the email message to images located on the Internet. When an individual previews or opens an image spam message, behind the scenes, the image is captured from the Internet and presented in the email body.
Because the spam message is contained almost entirely within an image,
traditional spam filtering techniques relying on email text analysis are ineffective. Image spam is dangerous because the messages include unique identifiers within the image links that are able to track when a recipient has opened or previewed an image spam message. When the image is viewed, the spammer knows the email address is valid, guaranteeing future spam messages from the spamming community as the address is resold.
Sophos bought (this part of?) ActiveState's antispam business. They sell a [link|http://www.sophos.com/products/sb/pmsbe/|PureMessage Small Business Edition] product.
I think that the suggestions that have been mentioned - filtering the images before they reach the user - make the most sense. People generally don't send images to each other to communicate. Image filtering rather than trying to read text in an image and/or OCR makes the most sense.
Or am I misunderstanding what you're trying to do?
Cheers,
Scott.