Honeypots: better than CAPTCHAs? | Messages in a Bottle

[17 January 2008]

As noted earlier, the short period of time between starting a blog and encountering comment spam has now passed, for this blog. And while the volume of comment spam is currently very low by most standards, for once I’d like to get out in front of a problem.

So when not otherwise committed, I spent most of yesterday reading about various comment-spam countermeasures, starting with those recommended by those who commented on my earlier post. (More comments on that post, faster, than on any other post yet: clearly the topic hit a nerve.)

If you’re keeping score, I decided ultimately to install Spam Karma 2, in part because my colleague Dom Hazaël-Massieux uses it, so I hope I can lean on him for support.

But the most interesting idea I encountered was certainly the one mentioned here by Lars Marius Garshol (to whom thanks). The full exposition of the idea by Ned Batchelder is perfectly clear (and to be recommended), but the gist can be sumarized thus:

Some comment spam comes from humans hired to create it.
Some spam comes from “playback bots” which learn the structure of a comment form once (with human assistance) and then post comments repeatedly, substituting link spam into selected fields.
Some comment spam comes from “form-filling bots”, which read the form and put more or less appropriate data into more or less the right fields, apparently guiding their behavior by field type and/or name.

For the first (human) kind of spam, there isn’t much you can do (says Batchelder). You can’t prevent it reliably. You can use rel="nofollow" in an attempt to discourage them, but Michael Hampton has argued in a spirited essay on rel="nofollow" that in fact nofollow doesn’t discourage spammers. By now that claim is more an empirical observation than a prediction. By making it harder to manipulate search engine rankings, rel="nofollow" makes spammers think it even more urgent (says Hampton) to get functional links into other places where people may click on them.

But I can nevertheless understand the inclination to use rel="nofollow": it’s not unreasonable to feel that if people are going to deface your site, you’d at least like to ensure their search engine ranking doesn’t benefit from the vandalism.

And of course, you can also always delete their comments manually when you see them.

For the playback bots, Batchelder uses a clever combination of hashing and a local secret to fight back: if you change the names of fields in the form, by hashing the original names together with a time stamp and possibly the requestor’s IP address, then (a) you can detect comments submitted a suspiciously long time after the comment form was downloaded, and (b) you can prevent the site-specific script from being deployed to an army of robots at different IP addresses.

My colleague Liam Quin has pointed out that this risks some inconvenience to real readers. If someone starts to write a comment on a post, then breaks off to head for the airport, and finally finishes editing their comment and submitting it after reaching their hotel at the other end of a journey, then not only will several hours have passed, but their IP number will have changed. Liam and I both travel a lot, so it may be easy for us to overestimate the frequency with which that happens in the population at large, but it’s an issue. And users behind some proxy servers (including those at hotels) will frequently appear to shift their IP addresses in a quirky and capricious manner.

For form-filling bots, Batchelder uses invisible fields as ‘honeypots’. These aren’t hidden fields (which won’t deceive bots, because they know about them), but fields created in such a way that they are not visible to the sighted human users. Since humans don’t see them, humans won’t fill them out, while a form-filling bot will see them and (in accordance with its nature) will fill them out. This gives the program which handles comment submissions a convenient test: if there’s new data in the honeytrap field, the comment is pretty certain to be spam.

Batchelder proposes a wide variety of methods for making fields invisible: CSS style “display: none” or “font-size: 0”, positioning the field absolutely and then carefully positioning an opaque image or something else opaque over it. And we haven’t even gotten into Javascript yet.

For the sake of users with Javascript turned off and/or CSS-impaired browsers, the field will be labeled “Please leave this field blank; it’s here to catch spambots” or something similar.

In some ways, the invisible-honeypot idea seems to resemble the idea of CAPTCHAs. In both cases, the human + computer system requesting something from a server is requested to perform some unpredictable task which a bot acting alone will find difficult. In the case of CAPTCHAs, the task is typically character-recognition from an image, or answering a simple question in natural language. In the case of the honeypot, the task is calculating whether a reasonably correct browser’s implementation of Javascript and CSS will or will not render a given field in such a way that a human will perceive it. This problem may be soluble, in the general case or in many common cases, by a program acting alone, but by far the simplest way to perform it is to display the page in the usual way and let a human look to see whether the field is visible or not. That is, unlike a conventional CAPTCHA, a honeypot input field demands a task which the browser and human are going to be performing anyway.

The first question that came to my mind was “But wait. What about screen readers? Do typical screen readers do Javascript? CSS?”

My colleagues in the Web Accessibility Initiative tell me the answer is pretty much a firm “Sometimes.” Most screen readers (they tell me) do Javascript; behavior for constructs like CSS display: none apparently varies. (Everyone presumably agrees that a screen reader shouldn’t read material so marked, but some readers do; either their developers disagree or they haven’t yet gotten around to making the right thing happen.) You do want to be sure, if you use this technique, to make sure the “Please leave empty” label is associated with the field in a way that will be clear to screen readers and the like. (Of course, this holds for all field labels, not just labels for invisible fields. See Techniques for WCAG 2.0 and Understanding WCAG 2.0 for more on this topic.)

The upshot appears to be:

For sighted or unsighted readers with Javascript and/or CSS processing supported by their software and turned on, a honeypot of this kind is unseen / unheard / unperceived (unless something goes wrong), and incurs no measurable cost to the human. The cost of the extra CSS or Javascript processing by the machine is probably measurable but negligeable.
For human readers whose browsers and/or readers don’t do Javascript and/or CSS, the cost incurred by a honeypot of this kind is (a) some clutter on the page and (b) perhaps a moment of distraction while the reader wonders “But why put a field there if you want me to leave it blank?” or “But how can putting a data entry field here help to catch spambots?” For most users, I guess this cost is comparable to that of a CAPTCHA, possibly lower. For users excluded by a CAPTCHA (unsighted users asked to read an image, linguistically hesitant users asked to perform in a language not necessarily their own), the cost of a honeypot seems likely to be either a little lower than that of a CAPTCHA, or a lot lower.

I’m not an accessibility expert, and I haven’t thought about this for very long. But it sure looks like a great idea to me, superior to CAPTCHAs for many users, and no worse than CAPTCHAs (as far as I can now tell) for anyone.

If this blog used homebrew software, I’d surely apply these techniques for resisting comment spam. And I think I can figure out how to modify WordPress to use some of them, if I ever get the time. But I didn’t see any off-the-shelf plugins for WordPress that use them. (It’s possible that Bad Behavior uses these or similar techniques, but I haven’t been able to get a clear idea of what it does, and it has what looks like a misguided affinity for the idea of blacklists, on which I have given up. As Mark Pilgrim points out, when we fight link spam, we might as well try to learn from the experience of fighting spam in other media.)

Is there a catch? Am I missing something?

What’s not to like?

As the author of both that spirited essay on nofollow, and the Bad Behavior software, I have a few comments that might be useful.

First, about Bad Behavior: At the time Bad Behavior was first developed, almost three years ago, there were no really good solutions for link spam. There still aren’t; the best we can do at this time is to manage it. Bad Behavior has two overriding design concerns: whenever possible, not to block legitimate users, and speed. Over its lifetime, several features have been removed, or disabled by default, because they had been shown to block legitimate users under some circumstances. I don’t find this acceptable.

Because Bad Behavior screens all requests, not just POST requests, in order to filter out scrapers and harvesters, speed is of utmost importance. This is especially true in the expected case of a blog which suddenly hits the front page of Digg or Slashdot (my own personal test case). A server that would ordinarily handle such an event should continue to handle it with Bad Behavior installed. This means I have to let some things go.

It used to use several realtime blackhole lists as data sources, but almost all of them turn out to be problematic for the purpose of distinguishing humans from spammers. Most of the blackhole lists have been removed at this point. It still uses some (but not all) of the Spamhaus SBL/XBL data, that which has generated no user complaints. Empirically, I can say that some blackhole data actually works. It should be noted that both for speed and to reduce the potential impact to humans, the blackhole lists are not consulted for HTTP GET requests, and anyone who is blocked is also presented a link to the blackhole list for self-removal.

The most interesting blackhole data for blocking link spam promises to be Project Honey Pot’s http:BL project, which targets exactly this sort of spam using exactly the techniques you discuss in this post. The data, though, is presented much differently than other blackhole lists, so existing code can’t be used unmodified. I have one of these honeypots and I’m also testing http:BL before I add it to Bad Behavior. (Though it appears someone has already beaten me to it.)

The majority of Bad Behavior, though, is sheer black magic. It’s the end result of compiling a database of countless thousands of HTTP requests from both legitimate users and spammers, comparing them, and where spammers can be easily distinguished, noting that distinction in a few lines of PHP code. This hasn’t always gone perfectly, as I noted before. I’ve had to disable or remove certain tests entirely because they blocked actual human beings, or in one case, a major search engine.

All that said, the current version is so stable and meets the two design goals so well that I’ve hardly had to touch it in nearly a year. But it is not for everyone. I intentionally let some spammers through because I either can’t sufficiently distinguish them from humans, or because a test for them would be too slow. Because of Bad Behavior’s unique approach, it’s my goal that Bad Behavior’s false positive rate should be as close as possible to zero, no matter how many spammers get by.

It’s for these and other reasons that I recommend using more than one anti-link spam solution. Once your comment spam load starts increasing, you may find yourself spending all your time sifting through the spam looking for the inevitable false positive comment that the other program misidentified as spam. (Unless, of course, you don’t care, but I do care.)

P.S. Thanks for the reminder that I need to update Bad Behavior’s documentation. 🙂

7 thoughts on “Honeypots: better than CAPTCHAs?”

Michael Hampton on 17 January 2008 at 17:23 said:

As the author of both that spirited essay on nofollow, and the Bad Behavior software, I have a few comments that might be useful.

First, about Bad Behavior: At the time Bad Behavior was first developed, almost three years ago, there were no really good solutions for link spam. There still aren’t; the best we can do at this time is to manage it. Bad Behavior has two overriding design concerns: whenever possible, not to block legitimate users, and speed. Over its lifetime, several features have been removed, or disabled by default, because they had been shown to block legitimate users under some circumstances. I don’t find this acceptable.

Because Bad Behavior screens all requests, not just POST requests, in order to filter out scrapers and harvesters, speed is of utmost importance. This is especially true in the expected case of a blog which suddenly hits the front page of Digg or Slashdot (my own personal test case). A server that would ordinarily handle such an event should continue to handle it with Bad Behavior installed. This means I have to let some things go.

It used to use several realtime blackhole lists as data sources, but almost all of them turn out to be problematic for the purpose of distinguishing humans from spammers. Most of the blackhole lists have been removed at this point. It still uses some (but not all) of the Spamhaus SBL/XBL data, that which has generated no user complaints. Empirically, I can say that some blackhole data actually works. It should be noted that both for speed and to reduce the potential impact to humans, the blackhole lists are not consulted for HTTP GET requests, and anyone who is blocked is also presented a link to the blackhole list for self-removal.

The most interesting blackhole data for blocking link spam promises to be Project Honey Pot’s http:BL project, which targets exactly this sort of spam using exactly the techniques you discuss in this post. The data, though, is presented much differently than other blackhole lists, so existing code can’t be used unmodified. I have one of these honeypots and I’m also testing http:BL before I add it to Bad Behavior. (Though it appears someone has already beaten me to it.)

The majority of Bad Behavior, though, is sheer black magic. It’s the end result of compiling a database of countless thousands of HTTP requests from both legitimate users and spammers, comparing them, and where spammers can be easily distinguished, noting that distinction in a few lines of PHP code. This hasn’t always gone perfectly, as I noted before. I’ve had to disable or remove certain tests entirely because they blocked actual human beings, or in one case, a major search engine.

All that said, the current version is so stable and meets the two design goals so well that I’ve hardly had to touch it in nearly a year. But it is not for everyone. I intentionally let some spammers through because I either can’t sufficiently distinguish them from humans, or because a test for them would be too slow. Because of Bad Behavior’s unique approach, it’s my goal that Bad Behavior’s false positive rate should be as close as possible to zero, no matter how many spammers get by.

It’s for these and other reasons that I recommend using more than one anti-link spam solution. Once your comment spam load starts increasing, you may find yourself spending all your time sifting through the spam looking for the inevitable false positive comment that the other program misidentified as spam. (Unless, of course, you don’t care, but I do care.)

P.S. Thanks for the reminder that I need to update Bad Behavior’s documentation. 🙂
John Cowan on 17 January 2008 at 17:26 said:

“you’d at least like to ensure their search engine ranking doesn’t benefit from the vandalism”

Indeed, your site will suffer harm in the Google rankings if you leave links to bad guys in place: Google has publicized this fact. So nofollow is good policy in general.

Disclaimer: I work for Google, but know little about search and nothing about SEO. And if I did know, I couldn’t tell you.
cmsmcq on 17 January 2008 at 19:57 said:

Michael: thanks for your comments and the clarifications. I’m tempted to install Bad Behavior right away, but they say that in order to gauge properly the improvement brought by a change, it helps to have taken some measurements before making the change. So I’m going to see what running Spam Karma 2 by itself is like, for a while, before trying Bad Behavior.

One thing struck me forcibly about the mentions of BB that I saw on the Web, and in particular the discussions of the recent bug that locked some blog owners out of their own blogs. All the really negative comments I saw were from outside observers, while all of the Bad Behavior users who blogged about it (at least, the ones I saw) said that they would continue to use BB because it was such a useful tool. When you have a bug that locks someone out of their own blog for a day (quick fix, by the way, bravo!) and they still love the software and won’t hear a word against it, well, that’s a remarkable level of user commitment.
Lars Marius Garshol on 18 January 2008 at 01:15 said:

“Is there a catch? Am I missing something?”

I’ve used a much simpler variant of this approach for over two years now without a hitch. Basically, I have two extra fields, asking the user “Is this spam” and “Is this not spam”, so you have to set one to false and the other to true. They default to true and false, however, so you have to change the default values to submit successfully. Three lines of JavaScript flip the values and hide the fields.

The result is that only the first two types of user-agents you listed get through, but I’ve never seen the second type. And it works even for unsighted readers and people who have JavaScript turned off. Although the latter group has to change the settings of two checkboxes to comment.
Michael Hampton on 18 January 2008 at 13:54 said:

When you have a bug that locks someone out of their own blog for a day (quick fix, by the way, bravo!) and they still love the software and won’t hear a word against it, well, that’s a remarkable level of user commitment.

I eat my own dog food. When something happens, I’m often the first to know, and I have to fix it fast.

As for user commitment, I recall reading a customer loyalty study which said that people who have a problem with a company that’s successfully resolved are likely to be more loyal than people who never have a problem with the same company. Not that I want people to have problems with the software, but I spend easily twice as much time answering users’ questions than actually working on the program! And that’s when there’s not a major bug to be squashed.

Interestingly, in the week after that particular problem surfaced, I received over $1,000 in contributions from Bad Behavior users , compared with maybe $250 for the entire previous year. I either have very committed users, or my users need to be committed. 🙂
Prosource Incorporated on 11 December 2009 at 09:27 said:

Is honeypot a new captcha alternative? Wow.. that would be so great.
deutz on 24 February 2011 at 11:00 said:

Is honeypot a new captcha alternative? Wow.. that would be so great.

Comments are closed.