IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New GAH! old timers disease
OCR tools havnt used them in years, anyone have advice? Used to wuzzez we translated faxes into text docs for db inserts. I should be able to use similar technology for identifying spam tags in images.
thanx,
bill
"the reason people don't buy conspiracy theories is that they think conspiracy means everyone is on the same program. Thats not how it works. Everybody has a different program. They just all want the same guy dead. Socrates was a gadfly, but I bet he took time out to screw somebodies wife" Gus Vitelli

Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 49 years. meep
questions, help? [link|mailto:pappas@catholic.org|email pappas at catholic.org]
New Hasn't this problem been solved?
Why do you want to read text in images on the chance that it's spam? Shouldn't you just assume that unsolicited images in e-mails without appropriate text are spam?

From [link|http://aspn.activestate.com/ASPN/Mail/Message/news-announce/1525459|2003]:

VANCOUVER, BC -- ActiveState Corp., the leader in enterprise email filtering software, has released new PureMessage technology to reliably catch a new and dangerous form of spam -- image spam. Increasing in frequency by over 25% since November, image spam isn't only a nuisance, it is a threat to every email box's security. Using PureMessage, organizations are assured protection against productivity loss, network downtime and vulnerability of informational assets associated with unsolicited or malicious email.

Image spam is an unsolicited commercial email that presents its message to individuals through visual images. This is accomplished by creating links within the body of the email message to images located on the Internet. When an individual previews or opens an image spam message, behind the scenes, the image is captured from the Internet and presented in the email body.

Because the spam message is contained almost entirely within an image,
traditional spam filtering techniques relying on email text analysis are ineffective. Image spam is dangerous because the messages include unique identifiers within the image links that are able to track when a recipient has opened or previewed an image spam message. When the image is viewed, the spammer knows the email address is valid, guaranteeing future spam messages from the spamming community as the address is resold.


Sophos bought (this part of?) ActiveState's antispam business. They sell a [link|http://www.sophos.com/products/sb/pmsbe/|PureMessage Small Business Edition] product.

I think that the suggestions that have been mentioned - filtering the images before they reach the user - make the most sense. People generally don't send images to each other to communicate. Image filtering rather than trying to read text in an image and/or OCR makes the most sense.

Or am I misunderstanding what you're trying to do?

Cheers,
Scott.
New tell me how I determine what is unsoliceted
by my several million plus users who are individuals paying for my email services?
user feedback
no one likes viagra ads
porn site advertising
409 scams
phishing
pharming
but not financials usually or I would have the legitimate email companies on my ass.
thanx,
bill
"the reason people don't buy conspiracy theories is that they think conspiracy means everyone is on the same program. Thats not how it works. Everybody has a different program. They just all want the same guy dead. Socrates was a gadfly, but I bet he took time out to screw somebodies wife" Gus Vitelli

Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 49 years. meep
questions, help? [link|mailto:pappas@catholic.org|email pappas at catholic.org]
New Bayesian filters.
I think it's a solved problem. If you've got 10M users and they all get the same message (or one that's slightly different), it's a good bet that it's spam. ;-) Bayesian filters can be trained to reject such things.

SpamAssassin has various [link|http://systems.cs.uoregon.edu/Solaris/spamassassin.php|rules] regarding images.

I'm reasonably sure this problem has been solved. But I'm no expert. If it's continuing to concern you, check the archives of the [link|http://wiki.apache.org/spamassassin/MailingLists|SpamAssassin mailing list] or ask there. They should know.

HTH. Luck!

Cheers,
Scott.
New actually I hang out here
[link|http://www.maawg.org/home/|http://www.maawg.org/home/] and spamm assasin is the engine that a lot of vendors use but the slippage is increasing concerning images.
thanx,
bill
"the reason people don't buy conspiracy theories is that they think conspiracy means everyone is on the same program. Thats not how it works. Everybody has a different program. They just all want the same guy dead. Socrates was a gadfly, but I bet he took time out to screw somebodies wife" Gus Vitelli

Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 49 years. meep
questions, help? [link|mailto:pappas@catholic.org|email pappas at catholic.org]
     text decoding from images - (boxley) - (9)
         Block HTML emails at the server. >;) -NT - (Steve Lowe)
         Bayesian filters - (Yendor) - (1)
             using them, not as effective as they used to be - (boxley)
         Hmm - (broomberg)
         GAH! old timers disease - (boxley) - (4)
             Hasn't this problem been solved? - (Another Scott) - (3)
                 tell me how I determine what is unsoliceted - (boxley) - (2)
                     Bayesian filters. - (Another Scott) - (1)
                         actually I hang out here - (boxley)

Weapon of choice.
63 ms