Daniel Lemire's blog

, 3 min read

Scam Spam, the death of email, and Machine Learning

3 thoughts on “Scam Spam, the death of email, and Machine Learning”

  1. Nice post!
    I can only agree with that.
    I would reformulate it (in a weaker form): the power is not in the algorithms but in the features you use. It is very hard to determine which are the right features, or the right representation of an email to be used by a learning algorithm.
    But the question is how users can help here?

    Here are a couple of random thoughts:
    One way could be that they suggest high level features (e.g. in the form of rules) that can then be combined.
    Maybe there should be a combination of examples, rules and features.
    Someone may say:
    [examples] this is a good email, this is a bad one
    [rules] when the sender is from @bla.com it is spam
    [features] how many times the term ‘viagra’ appears is a good feature

    Then you can imagine an algorithm that uses this to build its model. But ideally this model should remain understandable to the user (probably using something like rules again) so that he can modify it, or complement it….

  2. Uccai Siravas says:

    Those who are in academia have their e-mails on websites and are easy targets of spammers. One way to avoid is to have an e-mail system for such people is to requires an “electronic stamp” To send an e-mail to user X, one must access the user X’s website and get an “electronic stamp.”, which involves reading and typing distorted patterns, which machines are not good at. An e-mail received without stamps goes to a junk folder.

  3. These sorts of “spam fads” (spads?) happen every couple months or so; I’ve noticed some of these spams leaking through my filters since late spring. I rarely see more than a couple spams a day, despite my email addresses being all over the net (and let’s face it, once one spammer has your email address, they all have it). Each of my email addresses sends email through two levels of spam filtering: one either on a forwarding server (e.g, IEEE or ACM) or on the email host (Univ. of Washington) and the second the built-in filter in Apple’s Mail program. When a new spad starts, there’s a spike in leakage, and then Mail learns the new spam’s characteristics and the leakage drops to a trickle.

    I don’t see it as the death of email; just the email version of fast forwarding through the commercials on a TiVo. Yes, I’d rather not have to do it.