IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Just saw this a day or so ago
[link|http://www.nuclearelephant.com/projects/dspam/|http://www.nuclearel...m/projects/dspam/]

These guy love to rant about how much better they are than Spam-Assassin.

Crock'o'shit or worthwhile to investigate?
New No snake-oil flags up yet.
The biggest plus over SA is that "grandma" can forward spam to an email adress to help train the filter. But then, you can do that "manually" with SA + Exim.

In other words, try it out and tell us! ;)
New Just did a brief search
Didn't find anything verifying or refuting their claim. In a /. discussion on it I also saw [link|http://crm114.sourceforge.net/|http://crm114.sourceforge.net/] recommended as being similar and easier to install. (But more false positives were claimed for that.)

I shot an email off to someone who would know. I'll tell you if I get a useful response.

Cheers,
Ben
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
New And I did get a useful response
Matt Sergeant (well known developer in several circles, whose day job involves spam filters) got back to me on it overnight.

There's been a lot of discussion of DSPAM on the SpamAssassin dev mailing list. The basic summary is that DSPAM is *just* a bayesian classifier (with a slight twist on the algorithm, and a slight twist on token storage). Bayes works *great*. Really really well on individual mail corpuses. In developing bayes filters we can get similar results as he gets when filtering an individual's mail.

We are however dubious of their performance claims (speed rather than effectiveness). You can actually calculate the raw I/O that needs to be done with bayes and he's claiming DSPAM is faster than disk speed. So unless he's caching every bayes DB for 125,000 users in memory I think there's something funny going on.

In short, take his slagging off perl (he calls it "PERL") with a major pinch of salt.

As for CRM114, well I tried it and didn't get nearly the effectiveness that Bill claims, and get massive numbers of false positives (like 10%). It's also unbearably slow, though he claims to have sped it up now. Check out his language though (the CRM114 is a language that just happens to contain a bayes-like classifier). He wrote that because he didn't like perl *cough* ;-)

The problem is neither CRM114 or DSPAM's effectiveness claims are based on 10-pass cross fold evaluation (what SpamAssassin uses and what any classification system *must* use or it's just skewing the facts). They are both based on the developer's *own* email!!! Christ I can get effectiveness as good as they claim (on my own email) using just DNS blocklists!

In short, take their claims with a pinch of salt, but try them out - they may fit what you need.

Cheers,
Ben
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
     One of my All Microsoft clients finally cracked . . - (Andrew Grygus) - (7)
         Assuming IMAP, for a moment... - (pwhysall) - (4)
             Just saw this a day or so ago - (broomberg) - (3)
                 No snake-oil flags up yet. - (FuManChu)
                 Just did a brief search - (ben_tilly) - (1)
                     And I did get a useful response - (ben_tilly)
         Re: One of my All Microsoft clients finally cracked . . - (deSitter) - (1)
             Yup, that's the one. - (Andrew Grygus)

The haddock hits me with a sucker punch. I catch him with a left hook. He eels over. It was a fluke, but there he was, lying on the deck... flat as a mackerel. Kelpless.
39 ms