firteendesign
"Computers are my best friends. When I help people find each other without knowing they used a computer, my best friends and I revel in our little secret!"    -Langel

 

 

SHARE

The Spam Battle Continues...

Tuesday, September 8th 2009 2:52pm

I took my first whack at spam. Blocking IPs was a good start but I'm still spending too much time deleting these comments. Spam seems to have no shortness on available IP addresses nor do they run out of cute things to write —

"We are Dyslexia of Borg. Fusistance is retile. Your ass will be laminated."

Now I'm going to target the URLs

The whole point of comment spam is to get hotlinks all over the web. It's a vain attempt to increase the search rank of annoying and/or malicious websites. So let's kick'em in the family jewels.

First, I renamed my `ip_blacklist` MySQL table to `blacklist_ip` and created a similar `blacklist_url` table.

CREATE TABLE `blacklist_url` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT ,
`url` VARCHAR( 255 ) NOT NULL ,
`threshold` TINYINT NOT NULL DEFAULT '1',
PRIMARY KEY ( `id` )
) ENGINE = MYISAM

I'm still giving the URLs a chance with the `threshold` variable.

I put together the following function to create an array of all URLs found in a block of text —

function ExtractURLarray($text) {
$a = array();
preg_match_all('/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i',$text,$a);
$a = array_unique($a[0]);
return $a;
}

The comment's attached web address is an optional form field; not necessarily in the comment's post text —

$urlList = ExtractURLarray($comment->text);
if ($comment->website!='')
$urlList[] = $comment->website;

Get Them at the Domain Level

Absolute URLs from spam typically point to a forum post or user account on a victimized website, or a single page on a malicious one. Some spam posts have many URLs, often pointing to multiple pages on the same domain. If the domain appears malicious then it's better to block it entirely rather than letting the threshold buildup to 3 on an absolute URL.

The following code works best with the protocol prefix intact (http/ftp) —

function RipDomain($url) {
if(strpos($url, '/', 8))
return substr($url, 0, strpos($url, '/', 8));
else
return $url;
}

Make Yourself Some Options

In the previous spam battle post I added a SPAM button at the bottom of the comments. What I've done is add an extra step before the IP is added to the block list. Now I can decide how to handle each URL in the offending comment.

This new level of spam triangulation should continue to turn the tide.

posted by Langel

Leave a Comment

*name
*email
website
comment
no html