WebDigity Gangsta
Posts: 105
564 credits Members referred : 0
« Reply #11 on: Mar 25, 2006, 07:33:56 PM »
I have a spider trap too, I simply neede to speed up the work a little bit.
This is what my .htaccess looks like:
Code:
RewriteEngine on RewriteRule ^robots.txt robots.php [L,NC]
Then I have a script called robots.php:
Code:
<? $server = "localhost"; // MySQL hostname $username = ""; // MySQL username $password = ""; // MySQL password $dbname = ""; // MySQL db name
$db = mysql_connect($server, $username, $password) or die(mysql_error()); mysql_select_db($dbname) or die(mysql_error());
mysql_query("INSERT INTO spiderips VALUES(INET_ATON('". $_SERVER['REMOTE_ADDR'] ."'),'". $_SERVER['HTTP_USER_AGENT'] ."',". time() .")");
mysql_close();
?> User-agent: * Disallow: /norobots/
Every time a spider visits my robots.txt I catch the spider. Then I have this code on some other pages (not all of my pages):
Code:
<?$server = "localhost"; // MySQL hostname $username = ""; // MySQL username $password = ""; // MySQL password $dbname = ""; // MySQL db name
$db = mysql_connect($server, $username, $password) or die(mysql_error()); mysql_select_db($dbname) or die(mysql_error());
$spider_info = mysql_query("SELECT * FROM spiderips WHERE ip = INET_ATON('". $_SERVER['REMOTE_ADDR'] ."') OR useragent = '". $_SERVER['HTTP_USER_AGENT'] ."'"); if (mysql_num_rows($spider_info)) { if (!mysql_query("INSERT INTO spiderips VALUES(INET_ATON('". $_SERVER['REMOTE_ADDR'] ."'),'". $_SERVER['HTTP_USER_AGENT'] ."',". time() .")")) { mysql_query("UPDATE spiderips SET useragent = '". $_SERVER['HTTP_USER_AGENT'] ."', lastvisit = '". time() ."' WHERE ip = INET_ATON('". $_SERVER['REMOTE_ADDR'] ."')"); } } mysql_close(); ?>
This way I get the user agents for the spiders when some of them visits my robots.txt, then I get more IP addresses when I search in the database for the user agent that is a spider. I also get the last time a spider visited one of my sites.
I am a metal monkey!
Administrator Community Supporter?
Jedai Sword Master
Gender:
Posts: 8266
42583 credits Members referred : 3
« Reply #12 on: Mar 25, 2006, 09:36:50 PM »
That's very clever. Congrats, and thanks for sharing.
BTW you can make this work better if you use the LOW_PRIORITY statement in your inserts and updates
Instead of this :
Code:
$spider_info = mysql_query("SELECT * FROM spiderips WHERE ip = INET_ATON('". $_SERVER['REMOTE_ADDR'] ."') OR useragent = '". $_SERVER['HTTP_USER_AGENT'] ."'"); if (mysql_num_rows($spider_info)) { if (!mysql_query("INSERT INTO spiderips VALUES(INET_ATON('". $_SERVER['REMOTE_ADDR'] ."'),'". $_SERVER['HTTP_USER_AGENT'] ."',". time() .")")) { mysql_query("UPDATE spiderips SET useragent = '". $_SERVER['HTTP_USER_AGENT'] ."', lastvisit = '". time() ."' WHERE ip = INET_ATON('". $_SERVER['REMOTE_ADDR'] ."')"); } }