firteendesign
"Computers are my best friends. When I help people find each other without knowing they used a computer, my best friends and I revel in our little secret!"    -Langel

 

 

SHARE

Let's Battle Comment Spam with a PHP/MySQL DNSBL and no CAPTCHA

Wednesday, August 26th 2009 11:12am

While I'm logged in to my blog, I have an administrative delete button under every comment. But throwing a message in the trash is only effective against other humans. It is not the best strategy when attempting to thwart malicious, evil robots from the nethernets.

Recently, I wrote a post that caught some good traffic. But the spambots came in with the tide! 6 to 10 spam comments per hour was adding up real fast. "The delete icons, they does nothing!" I retorted under the onslaught of chronicle ruin. Surely I would not fall prey to displaying captcha code injection?!?!?!

Warning - This post is not a thorough, step-by-step tutorial.

What's wrong with a little CAPTCHA?

There are cute captchas, helping computers read books, and there are ajax fancy captchas with a medium security risk. They are hard to read and harder to please.

Headaches. Nausea. Vertigo. Dementia. WTF

The side effects of an anti-spam prescription known as —
Completely Automated Public Turing test to tell Computers and Humans Apart

Let's Build a DNSBL

We want to block all messages coming from certain IP addresses. Using a DNS Block List is a popular method and there is a mighty slew of them around. I tested some of the IP's in my database with a few of the public DNSBL's and only got about 90% accuracy. Plus, I couldn't find a simple API. So we'll manage this Block List business ourselves.

First off, make sure you are saving every commentator's IP with their comments. To detect the 'Real IP Address of Client' I lifted the following script from here which was obviously ripped from somewhere else and I even cleaned it up a bit.

function GetRealIP() {
if (!empty($_SERVER['HTTP_CLIENT_IP']))
$ip = $_SERVER['HTTP_CLIENT_IP'];
elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR']))
$ip = explode(',',$_SERVER['HTTP_X_FORWARDED_FOR'];
$ip = $ip[0];
else
$ip = $_SERVER['REMOTE_ADDR'];
return $ip;
}

$_SERVER['REMOTE_ADDR'] is the traditional way of getting an IP from PHP. It seems to me, the other two detect an IP behind a proxy server, and are set by Apache (so I don't know if they work on IIS). HTTP_X_FORWARDED_FOR returns a coma separated list of the client IP and the proxy servers. When I started to run this script, I did notice the number of IP variants decline. Now that's a start for optimizing spam maintenance!

Our DNS Block List needs a table...

CREATE TABLE `ip_blacklist` (
`ip` VARCHAR( 15 ) NOT NULL ,
`threshold` TINYINT NOT NULL DEFAULT '1')

...or something like that, whatever works with your framework. ;)

My own, personal framework is RESTful. To mark comment #1208 as spam I link to firteendesign.com/blog/MarkCommentSpam/1208/. I wrote the following code into the blog controller —

if ($ACT=='MarkCommentSpam') {
if ($Guy->IsLoggedIn()) {
$comment = STACK::Fetch('blog_comment',$PARAM1);
if ($comment) {
$ip = STACK::Find('ip_blacklist',"WHERE `ip` = '$comment->ip'");
if ($ip)
$ip->Update('threshold',$ip->threshold+1);
else {
$ip = new ip_blacklist();
$ip->ip = $comment->ip;
$ip->threshold = 1;
$ip->Save();
}
if ($ip->threshold>=3)
uHAT::MultiDelete('blog_comment',"WHERE `ip` = '$comment->ip'");
else
$comment->Delete();
}
}
redir($_SERVER['HTTP_REFERER']);
}

First it checks if I'm logged in and if the comment from the URL exists. Then it looks for an ip record in the block list, creates one if not, or increases threshold of an existing one. A different set of actions are taken depending on the ip's current `threshold` value :

  1. IP is labeled SPAM, comment is deleted, and the IP is added to the blacklist
  2. comment deleted, blacklist threshold raised
  3. all comments from that IP are deleted, blacklist threshold raised

Wait, why am I bothering to give these netbot scum a second and a third chance? For human error. Marking a comment as spam can happen on accident. I don't want to accidentally perma-ban a nice commentator!

Now, add a mark spam button next to the trash icon and GET TO WAR!!

700 comments were destroyed in about 15 minutes of furious clicking, 67 IP's banished. 5 hours pass and only one more spam got through because...

Blocking Future SPAM

Wherever your comment form validation and processing takes place, you need to run a check on the client's IP against your block list. Doing so is the whole reason for maintaining a block list! My code sends the offender back to the article without posting their message, without telling them why, and they do not collect $200.

if (uHAT::Count('ip_blacklist',"WHERE `ip` = '$comment->ip' && `threshold` >= 3"))
redir($comment->URL);

Improvements for the Future

Two obvious things come to mind.

  1. Show a page explaining the situation when a banned IP posts a comment, including an email address for registering a dispute.
  2. Filter the spam instead of deleting it.

If a human disputes your spam label, you could still recover what they had to say.

Or not.

:D/

posted by Langel

Leave a Comment

*name
*email
website
comment
no html