Captchas: making them simpler, and dialing down the angst against them
Note: This blog post is from 2006. Some content may be outdated--though not necessarily. Same with links and subsequent comments from myself or others. Corrections are welcome, in the comments. And I may revise the content as necessary.Most by now understand what captchas are. Some love 'em, some hate 'em. I want to dial down the rhetoric some with this perspective: as a blog owner fighting frequent spam in comments and trackbacks, captchas (in some form, not necessarily a graphic) have their place to keep out spambots, and they can indeed be simplified (even the graphics ones) and at no loss of benefit. My bottom line: I don't use them as a double-key deadbolt lock to keep out intruders, I just use them as a screendoor to keep out random pests.
If you use Peter Farrell's Lyla Captcha, which I use because it's embedded in Ray's BlogCFC, in the next entry I'll show a few quick changes you could make in the Lyla captcha.xml file to make them much easier to read, going from this
to this
.
Before that, I just want to expand on those thoughts above on the general angst against captcha's, and why I think it's ok to make them easier to read.
The Haters
I realize that some have gone to great lengths to decry captchas primarily because they are not "accessible" (to those using screenreaders), though audio ones help solve that.
Others simply hate them because they're too darned difficult to read. I've surely seen that, even in the ones created by default in Lyla (thus my next entry on addressing that).
Now, while most use a graphic that a user must read, it's not the only approach. As the previous link discusses, other approaches include simpler approaches like asking the reader to add some numbers or answer a question (that only a human could reasonably do).
But the other complaint is that they give those who use them a false sense of security, because they can be easily broken, even the graphic ones.
But my Blog is Not a Bank
Here's the thing: my blog is not a bank. While the difficulty in breaking a captcha may be important to a bank or commercial site trying to use them for authentication, I just want to make it hard for an automated spambot to post crap in my blog comments and trackback forms. If you have any similar king of input form on a publicly accessible site, you may suffer similar problems.
I really can't believe anyone would go to the lengths of scanning and breaking the captcha on my site (random as it is) to get a crap spam comment into my lil' ol' blog. And some of the comments are just nonsense; it's not like they're trying to drive traffic to another site or something--so the popularity of my (or your) site isn't the issue. It's just the annoyance factor (both to me as I get notified of comments and to readers who would have to sift through them if I didn't delete them as I do now).
Having made the case for why a simpler captcha may suffice for some purposes, in the next entry I'll show how to control the degree of difficulty in reading them for captchas built using Lyla Captcha.
For more content like this from Charlie Arehart:Need more help with problems?
- Signup to get his blog posts by email:
- Follow his blog RSS feed
- View the rest of his blog posts
- View his blog posts on the Adobe CF portal
- If you may prefer direct help, rather than digging around here/elsewhere or via comments, he can help via his online consulting services
- See that page for more on how he can help a) over the web, safely and securely, b) usually very quickly, c) teaching you along the way, and d) with satisfaction guaranteed
As for spammers targeing popular blog software, I suppose it's possible. I don't know, though. They'd still need to scan and break into each individual blog post, so I'd think it tied more to the popularity of the blogger.
Speaking of which: I don't know if it's my lack of popularity or just that the concept of a simpler captcah works, but it's now been a couple weeks since I implemented the change and so far so good! :-) No blogspam even with the simpler captchas. Hope my readers/commenters appreciate that as much as I do. Cheers.
>
I was referring to one of the conversations on the UKCFUG mailing list. The look and feel of my blog has been very much neglected so I'll have to have a tinker and see how to add it. Thanks for the tip.
> As for spammers targeing popular blog software, I suppose it's possible. I don't know, >though. They'd still need to scan and break into each individual blog post, so I'd think it >tied more to the popularity of the blogger.
>
Making you a prime target ;-). Llets say for example 90% of the most read cf blogs on fullasagoog use Ray Camdens BlogCFC - that would increase the incentive to try and crack the captchas. Now i'm really playing devils advocate here and I am in now way condoning spamming, but many moons ago I wrote code to harvest data from websites.
I had a look out of interest at one of the OCR products that had been used to crack captchas and it didn't appear to have a command line interface so I guess it would still be a
challenge to have an automated bot pipe captchas to an OCR. But again I also imagine there are OCR programs out there that do have a command line interface or other means which make them easier to integrate. And a quick search throws up at least one OCR webservice..
http://www.leadtools...
(of course there is no guarantee that the service can read the captcha).
So putting the pieces together (and not trying to give anyone any ideas here) but I think it would be actually less work than you think:
Find a list of cf related blogs - perhaps from goog or mxna
- use httpUnit or something similar to traverse the blogs
- find the captcha (which, from what I can tell always have the same name)
- hopefully send it to a third party component or webservice to get it cracked
- blog spam away :)
Finally I am not trying to trivialise or undermine the excellent work which Peter has done for Lyla.
Kola
Further, even if a single break-in did concern me, there would seem a ready solution to the problem you pose: their effort to break the captcha would have to be driven by analysis of an existing captcha on my or another BlogCFC site. Since the whole point of my next post was how to modify the captcha generation, all it would take is for any one BlogCFC user to tweak it just a bit to change the complexity rules and that would seem to thwart such a global attempt.
Further, since it (my blog and BlogCFC) is a CF-driven, one could easily code it to make some or all of the underlying parms to be random, so that no scanning service could readily target it. Again, I'm not arguing with you here. Just sharing thoughts for the benefit of future readers. Thanks again for your feedback.