[Looking for Charlie's main web site?]

Captchas: making them simpler, and dialing down the angst against them

Note: This blog post is from 2006. Some content may be outdated--though not necessarily. Same with links and subsequent comments from myself or others. Corrections are welcome, in the comments. And I may revise the content as necessary.
Most by now understand what captchas are. Some love 'em, some hate 'em. I want to dial down the rhetoric some with this perspective: as a blog owner fighting frequent spam in comments and trackbacks, captchas (in some form, not necessarily a graphic) have their place to keep out spambots, and they can indeed be simplified (even the graphics ones) and at no loss of benefit. My bottom line: I don't use them as a double-key deadbolt lock to keep out intruders, I just use them as a screendoor to keep out random pests.

If you use Peter Farrell's Lyla Captcha, which I use because it's embedded in Ray's BlogCFC, in the next entry I'll show a few quick changes you could make in the Lyla captcha.xml file to make them much easier to read, going from this
hard captcha
to this
simple captcha.

Before that, I just want to expand on those thoughts above on the general angst against captcha's, and why I think it's ok to make them easier to read.

The Haters

I realize that some have gone to great lengths to decry captchas primarily because they are not "accessible" (to those using screenreaders), though audio ones help solve that.

Others simply hate them because they're too darned difficult to read. I've surely seen that, even in the ones created by default in Lyla (thus my next entry on addressing that).

Now, while most use a graphic that a user must read, it's not the only approach. As the previous link discusses, other approaches include simpler approaches like asking the reader to add some numbers or answer a question (that only a human could reasonably do).

But the other complaint is that they give those who use them a false sense of security, because they can be easily broken, even the graphic ones.

But my Blog is Not a Bank

Here's the thing: my blog is not a bank. While the difficulty in breaking a captcha may be important to a bank or commercial site trying to use them for authentication, I just want to make it hard for an automated spambot to post crap in my blog comments and trackback forms. If you have any similar king of input form on a publicly accessible site, you may suffer similar problems.

I really can't believe anyone would go to the lengths of scanning and breaking the captcha on my site (random as it is) to get a crap spam comment into my lil' ol' blog. And some of the comments are just nonsense; it's not like they're trying to drive traffic to another site or something--so the popularity of my (or your) site isn't the issue. It's just the annoyance factor (both to me as I get notified of comments and to readers who would have to sift through them if I didn't delete them as I do now).

Having made the case for why a simpler captcha may suffice for some purposes, in the next entry I'll show how to control the degree of difficulty in reading them for captchas built using Lyla Captcha.

For more content like this from Charlie Arehart: Need more help with problems?
  • If you may prefer direct help, rather than digging around here/elsewhere or via comments, he can help via his online consulting services
  • See that page for more on how he can help a) over the web, safely and securely, b) usually very quickly, c) teaching you along the way, and d) with satisfaction guaranteed
Comments
While I did make the comment about the false sense of security it gives some people, I couldn't agree more Charlie. For something such as your blog they are a quick cheap and reliable mechanism for keeping spammers out. I do wonder though, for popular bloging software, if spammers would make an effort to crack them.
# Posted By kola | 8/19/06 2:53 AM
Hi Kola, thanks for that. As for your having mentioned false security, I'll say I'd not seen that. Was it on your blog? I just tried to search and noticed that you have no search feature. I've used blog-city for years (for a couple other blogs) and I know you can easily add it, so you may want to consider it. I did search using google's site search, still to no avail (no reference to captchas). Maybe you meant in some email list.

As for spammers targeing popular blog software, I suppose it's possible. I don't know, though. They'd still need to scan and break into each individual blog post, so I'd think it tied more to the popularity of the blogger.

Speaking of which: I don't know if it's my lack of popularity or just that the concept of a simpler captcah works, but it's now been a couple weeks since I implemented the change and so far so good! :-) No blogspam even with the simpler captchas. Hope my readers/commenters appreciate that as much as I do. Cheers.
> Hi Kola, thanks for that. As for your having mentioned false security, I'll say I'd not seen that. Was it on your blog? I just tried to search and noticed that you have no search feature. I've used blog-city for years (for a couple other blogs) and I know you can easily add it, so you may want to consider it. I did search using google's site search, still to no avail (no reference to captchas). Maybe you meant in some email list.
>

I was referring to one of the conversations on the UKCFUG mailing list. The look and feel of my blog has been very much neglected so I'll have to have a tinker and see how to add it. Thanks for the tip.

> As for spammers targeing popular blog software, I suppose it's possible. I don't know, >though. They'd still need to scan and break into each individual blog post, so I'd think it >tied more to the popularity of the blogger.
>

Making you a prime target ;-). Llets say for example 90% of the most read cf blogs on fullasagoog use Ray Camdens BlogCFC - that would increase the incentive to try and crack the captchas. Now i'm really playing devils advocate here and I am in now way condoning spamming, but many moons ago I wrote code to harvest data from websites.

I had a look out of interest at one of the OCR products that had been used to crack captchas and it didn't appear to have a command line interface so I guess it would still be a
challenge to have an automated bot pipe captchas to an OCR. But again I also imagine there are OCR programs out there that do have a command line interface or other means which make them easier to integrate. And a quick search throws up at least one OCR webservice..

http://www.leadtools...

(of course there is no guarantee that the service can read the captcha).
So putting the pieces together (and not trying to give anyone any ideas here) but I think it would be actually less work than you think:

Find a list of cf related blogs - perhaps from goog or mxna
- use httpUnit or something similar to traverse the blogs
- find the captcha (which, from what I can tell always have the same name)
- hopefully send it to a third party component or webservice to get it cracked
- blog spam away :)

Finally I am not trying to trivialise or undermine the excellent work which Peter has done for Lyla.

Kola
# Posted By kola | 8/22/06 3:59 AM
Fair enough (your playing devil's advocate). That's always valuable when it comes to security. Still, I guess my response would be: ok, I'm willing to accept that my simplified captcha may be more easily cracked. When it is, and I start getting blogspam, then I'd wratchet up the complexity of the captcha. I doubt it would happen, but if so, I can easily respond. Again, to me this is so different from the concerns of a bank where a single break-in is disastrous. For me it's more like discovering a break in my screendoor. "Oops, better fix that. Moths getting in."

Further, even if a single break-in did concern me, there would seem a ready solution to the problem you pose: their effort to break the captcha would have to be driven by analysis of an existing captcha on my or another BlogCFC site. Since the whole point of my next post was how to modify the captcha generation, all it would take is for any one BlogCFC user to tweak it just a bit to change the complexity rules and that would seem to thwart such a global attempt.

Further, since it (my blog and BlogCFC) is a CF-driven, one could easily code it to make some or all of the underlying parms to be random, so that no scanning service could readily target it. Again, I'm not arguing with you here. Just sharing thoughts for the benefit of future readers. Thanks again for your feedback.
Copyright ©2024 Charlie Arehart
Carehart Logo
BlogCFC was created by Raymond Camden. This blog is running version 5.005.
(Want to validate the html in this page?)

Managed Hosting Services provided by
Managed Dedicated Hosting