Simple Text-based CAPTCHA Implementation

Spambots are automated scripts that crawl on the net searching for URLs containing some kind of application forms – such as forums, guestbooks, or comment form on popular blogs -, and then automatically posting whatever its initial launcher (spammer) wants everybody to know. It usually carries commercial messages, offers, or simply just site promotions. This annoying practice has been one of the biggest problems of the Internet since the early days.

There are several known ways to fight this kind of spambot, like applying moderation mechanism to allow moderators of the site doing some sort of manual checking and validation against every post submitted. Despite being an effective (yet not too efficient) way to prevent spams, there is in fact a more preferred method called CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart). As being indicated by the word automatic in the name, this AI-based testing attempts to eliminate manual validations existed in a moderated system, adding a higher degree of efficiency.

Different algorithms have been developed to implement CAPTCHA. The most popular one is by challenge users to rewrite a certain text or word presented as a distorted image, assuming such text will be difficult for computer to read but still recognizable to human. Another algorithm is to present users a sound and challenge them to write what they’ve heard. But my favorite CAPTCHA implementation is the old and simple text-based challenge. It works by asking users to answer a randomly generated question, like “What is the color of the sky at night?” or simple math question like “What is twenty divided by five?”. Personally I’d prefer this kind of question-answer interaction to a system that asking me to write down something it shows. It feels more “human”, and it works at roughly the same security level as the other methods.

How it works?

In order to implement this text-based CAPTCHA, you’ll need to have a collection of questions stored in some sort of database, whether it’s an RDBMS, a file, or a specially built web service. Generally the questions should be easy and simple to guarantee everybody knows the answer, unless you’re expecting certain limited users that can submit their responses. In this case you can present question that is esoteric to the kind of users you’re expecting, adding some security measures.

To validate users’ answer, you also need to enable session for your pages. It uses session to track what question has been presented and answered by user, and then matches user’s answer with the correct answer stored in database. Depending to the question, it might has more than one correct answers.

Consider the following question: “What is six multiplied by five?”. Users might have typed the answer as:

And since both answers are correct, you will most likely use regular expression in the matching process.

PHP Implementation

And now I’ll show you how easy it is to make the implementation of this CAPTCHA system using PHP. We’ll use MySQL to store the questions. The SQL script to generate the table looks like this:

--- Table structure for table `captchas`
CREATE TABLE IF NOT EXISTS `captchas` (
`id` int(10) unsigned NOT NULL auto_increment,
`question` varchar(255) NOT NULL,
`answer` varchar(255) NOT NULL,
PRIMARY KEY  (`id`)
) AUTO_INCREMENT=1;

--- Dumping data for table `captchas`
INSERT INTO `captchas` (`id`, `question`, `answer`) VALUES
(1, 'What is six multiplied by five?', '30:thirty');

Here we just generated a table namely captchas, consists of three columns id, question, answer, which are quite self-explanatory. The SQL script also inserts one question-answer pair as an example.

One thing to notice is that how we use colon (:) to separate all possible correct answers in the answer column. You can put an arbitrary number of correct answers this way, however it is best to keep it at the minimum. Those answers will later be splitted using regular expression at the matching process.

// Match $answer
// $db_answer is a string of answer obtained from the database
$answer_ok = false;
$arr_answers = split(':', $db_answer);
for ($i = 0; $i < count($arr_answers); $i++) {
  $check_against = $arr_answers[$i];
  if (preg_match('/\b' . $check_against . '\b/i', $answer)) {
    $answer_ok = true;
    break;
  }
}

This snippet matches the answer submitted by user ($answer) against a collection of correct answers obtained from the database ($db_answer). It will first split the possible answers into an array ($arr_answers), iterates it and then matches the answer on each iteration using the exact word matching method. Once the answer matched the iteration will be stopped, and a boolean flag $answer_ok will be set to true indicating the user has passed CAPTCHA validation.

Using this exact word matching, thus our example will accept answers like “30″, “the number 30″, or “it’s thirty”. And will reject answer like “#30″ or “number-thirty”. This small kind of flexibility somehow gives a sense of human into the system which I like and can not be found at image- or sound-based CAPTCHA.

The question presented to users is randomly selected from the database using the following snippet:

// Query string to pick single question randomly
$query = 'SELECT * FROM captchas LIMIT ' . (rand(0, $q_count-1)) . ', 1';

To explore the rest of the code, I’ve made a downloadable demo files of this simple text-based CAPTCHA implementation, free for you to use and improve.

Further Considerations

Update history

Feb 07, 2008
Added the ASP Implementation download files of this text-based CAPTCHA.

Posted on 3rd Feb 08, in Programming

View or Post Responses.

Site Links

  1. Home
  2. Portfolio
  3. Blog
  4. News
  5. About
  6. Newsletter
  7. Hire Me
  8. Contact
  9. Sitemap