How To Use Multiple CSS Backgrounds, a Tutorial
Packaging Disaster with the Creative I/O USB Dongle
Let's Battle Comment Spam with a PHP/MySQL DNSBL and no CAPTCHA
Why Writing a Blogging Engine is not an Absolute Waste of Time
Another One Bites the Blog-o-Sphere — Let's Do it with Style
Branding Presidential Candidates — the McCain and Obama Campaigns
Fix Apache's httpd.pid Conflict with Skype
Going Public with a New Layout
Putting Hyperlinks in a PDF Document with Adobe inDesign
Tutorial Run : Outer Space Text Effect
Can we say goodbye to Internet Explorer 6 yet?
Too Much White Background to Handle o___@
Let's Battle Comment Spam with a PHP/MySQL DNSBL and no CAPTCHA
Wednesday, August 26th 2009 11:12am
While I'm logged in to my blog, I have an administrative delete button under every comment. But throwing a message in the trash is only effective against other humans. It is not the best strategy when attempting to thwart malicious, evil robots from the nethernets.
Recently, I wrote a post that caught some good traffic. But the spambots came in with the tide! 6 to 10 spam comments per hour was adding up real fast. "The delete icons, they does nothing!" I retorted under the onslaught of chronicle ruin. Surely I would not fall prey to displaying captcha code injection?!?!?!
Warning - This post is not a thorough, step-by-step tutorial.
What's wrong with a little CAPTCHA?
There are cute captchas, helping computers read books, and there are ajax fancy captchas with a medium security risk. They are hard to read and harder to please.

Headaches. Nausea. Vertigo. Dementia. WTF
The side effects of an anti-spam prescription known as —
Completely Automated Public Turing test to tell Computers and Humans Apart
Let's Build a DNSBL
We want to block all messages coming from certain IP addresses. Using a DNS Block List is a popular method and there is a mighty slew of them around. I tested some of the IP's in my database with a few of the public DNSBL's and only got about 90% accuracy. Plus, I couldn't find a simple API. So we'll manage this Block List business ourselves.
First off, make sure you are saving every commentator's IP with their comments. To detect the 'Real IP Address of Client' I lifted the following script from here which was obviously ripped from somewhere else and I even cleaned it up a bit.
function GetRealIP() {
if (!empty($_SERVER['HTTP_CLIENT_IP']))
$ip = $_SERVER['HTTP_CLIENT_IP'];
elseif (!empty($_SERVER['HTTP_X_FORWARDED_FOR']))
$ip = explode(',',$_SERVER['HTTP_X_FORWARDED_FOR'];
$ip = $ip[0];
else
$ip = $_SERVER['REMOTE_ADDR'];
return $ip;
}
$_SERVER['REMOTE_ADDR'] is the traditional way of getting an IP from PHP. It seems to me, the other two detect an IP behind a proxy server, and are set by Apache (so I don't know if they work on IIS). HTTP_X_FORWARDED_FOR returns a coma separated list of the client IP and the proxy servers. When I started to run this script, I did notice the number of IP variants decline. Now that's a start for optimizing spam maintenance!
Our DNS Block List needs a table...
CREATE TABLE `ip_blacklist` (
`ip` VARCHAR( 15 ) NOT NULL ,
`threshold` TINYINT NOT NULL DEFAULT '1')
...or something like that, whatever works with your framework. ;)
My own, personal framework is RESTful. To mark comment #1208 as spam I link to firteendesign.com/blog/MarkCommentSpam/1208/. I wrote the following code into the blog controller —
if ($ACT=='MarkCommentSpam') {
if ($Guy->IsLoggedIn()) {
$comment = STACK::Fetch('blog_comment',$PARAM1);
if ($comment) {
$ip = STACK::Find('ip_blacklist',"WHERE `ip` = '$comment->ip'");
if ($ip)
$ip->Update('threshold',$ip->threshold+1);
else {
$ip = new ip_blacklist();
$ip->ip = $comment->ip;
$ip->threshold = 1;
$ip->Save();
}
if ($ip->threshold>=3)
uHAT::MultiDelete('blog_comment',"WHERE `ip` = '$comment->ip'");
else
$comment->Delete();
}
}
redir($_SERVER['HTTP_REFERER']);
}
First it checks if I'm logged in and if the comment from the URL exists. Then it looks for an ip record in the block list, creates one if not, or increases threshold of an existing one. A different set of actions are taken depending on the ip's current `threshold` value :
- IP is labeled SPAM, comment is deleted, and the IP is added to the blacklist
- comment deleted, blacklist threshold raised
- all comments from that IP are deleted, blacklist threshold raised
Wait, why am I bothering to give these netbot scum a second and a third chance? For human error. Marking a comment as spam can happen on accident. I don't want to accidentally perma-ban a nice commentator!
Now, add a mark spam button next to the trash icon and GET TO WAR!!

700 comments were destroyed in about 15 minutes of furious clicking, 67 IP's banished. 5 hours pass and only one more spam got through because...
Blocking Future SPAM
Wherever your comment form validation and processing takes place, you need to run a check on the client's IP against your block list. Doing so is the whole reason for maintaining a block list! My code sends the offender back to the article without posting their message, without telling them why, and they do not collect $200.
if (uHAT::Count('ip_blacklist',"WHERE `ip` = '$comment->ip' && `threshold` >= 3"))
redir($comment->URL);
Improvements for the Future
Two obvious things come to mind.
- Show a page explaining the situation when a banned IP posts a comment, including an email address for registering a dispute.
- Filter the spam instead of deleting it.
If a human disputes your spam label, you could still recover what they had to say.
Or not.
:D/
posted by Langel
Leave a Comment

