How To Use Multiple CSS Backgrounds, a Tutorial
Packaging Disaster with the Creative I/O USB Dongle
Let's Battle Comment Spam with a PHP/MySQL DNSBL and no CAPTCHA
Why Writing a Blogging Engine is not an Absolute Waste of Time
Another One Bites the Blog-o-Sphere — Let's Do it with Style
Branding Presidential Candidates — the McCain and Obama Campaigns
Fix Apache's httpd.pid Conflict with Skype
Going Public with a New Layout
Putting Hyperlinks in a PDF Document with Adobe inDesign
Tutorial Run : Outer Space Text Effect
Can we say goodbye to Internet Explorer 6 yet?
Too Much White Background to Handle o___@
The Spam Battle Continues...
Tuesday, September 8th 2009 2:52pm
I took my first whack at spam. Blocking IPs was a good start but I'm still spending too much time deleting these comments. Spam seems to have no shortness on available IP addresses nor do they run out of cute things to write —
"We are Dyslexia of Borg. Fusistance is retile. Your ass will be laminated."
Now I'm going to target the URLs
The whole point of comment spam is to get hotlinks all over the web. It's a vain attempt to increase the search rank of annoying and/or malicious websites. So let's kick'em in the family jewels.
First, I renamed my `ip_blacklist` MySQL table to `blacklist_ip` and created a similar `blacklist_url` table.
CREATE TABLE `blacklist_url` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`url` VARCHAR( 255 ) NOT NULL ,
`threshold` TINYINT NOT NULL DEFAULT '1',
PRIMARY KEY ( `id` )
) ENGINE = MYISAM
I'm still giving the URLs a chance with the `threshold` variable.
I put together the following function to create an array of all URLs found in a block of text —
function ExtractURLarray($text) {
$a = array();
preg_match_all('/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i',$text,$a);
$a = array_unique($a[0]);
return $a;
}
The comment's attached web address is an optional form field; not necessarily in the comment's post text —
$urlList = ExtractURLarray($comment->text);
if ($comment->website!='')
$urlList[] = $comment->website;
Get Them at the Domain Level
Absolute URLs from spam typically point to a forum post or user account on a victimized website, or a single page on a malicious one. Some spam posts have many URLs, often pointing to multiple pages on the same domain. If the domain appears malicious then it's better to block it entirely rather than letting the threshold buildup to 3 on an absolute URL.
The following code works best with the protocol prefix intact (http/ftp) —
function RipDomain($url) {
if(strpos($url, '/', 8))
return substr($url, 0, strpos($url, '/', 8));
else
return $url;
}
Make Yourself Some Options
In the previous spam battle post I added a SPAM button at the bottom of the comments. What I've done is add an extra step before the IP is added to the block list. Now I can decide how to handle each URL in the offending comment.

This new level of spam triangulation should continue to turn the tide.
posted by Langel
1 Comment
Leave a Comment


Adipex
Respect work O_o http://www.stgallplan.org/jforum/user/profile/95.page Go tramadol lol http://www.stgallplan.org/jforum/user/profile/96.page XanaxVS *ROFL* http://mediacloisters.vassar.edu/index.php/member/10832/ generic viagra ;-(((
posted on Monday, May 3rd 2010 12:53amhttp://www.stgallplan.org/jforum/user/profile/97.page DIAZEPAM >:]]] http://www.stgallplan.org/jforum/user/profile/101.page phentermine- 012