I love Pokemon
Gender:
Posts: 14
84 credits Members referred : 0
« Reply #1 on: Mar 28, 2006, 04:16:20 PM »
Being able to ban specific useragents for blocking bad robots or by keyword would be nice. For example, being able to block any domain with *casino* in it would be extremely useful.
I do not think this is possible using the current script but I would think it is possible to add a couple of extra functions to setup.php along with a small menu that would create links to connect to a different update file.
For example, the script currently used HTTP_REFERER and to block bad bots we would need to use HTTP_USER_AGENT and the syntax for the directives would be slightly different.
I am working on this functionality already for aStatSpam which I have converted to a phpNuke content management system module and have so far verified over 2000 bad referers. My current list can be found at www.code-authors.com/update.php
I am a metal monkey!
Administrator Community Supporter?
Jedai Sword Master
Gender:
Posts: 7942
40601 credits Members referred : 3
« Reply #2 on: Mar 28, 2006, 04:25:02 PM »
Well about the first thing you say, it is possible now.
Just put in the custom domains array the word casino and you are done
Quote
I do not think this is possible using the current script but I would think it is possible to add a couple of extra functions to setup.php along with a small menu that would create links to connect to a different update file.
That's a nice advice. I will look it when I will start coding the script again.
About the user agent, that's an easy one, I will add it in the future. Untill then you can use this
I love Pokemon
Gender:
Posts: 14
84 credits Members referred : 0
« Reply #7 on: Mar 29, 2006, 08:51:37 AM »
Yes, that is something I had thought about but I found that the page load time was quicker with the directives in htaccess than the page load time I was getting without them due to multiple page loads per useragent. Before blocking them my average page generation time was over 3 seconds, with the directives in place it is just over 1 second. I currently have over 2000 blocks on referers, several hundred blocks by IP including some whole country ranges as well as the useragent blocks. By far the biggest improvement was made by banning 'Slurp' which can consume massive amounts of resources even if you use the robots.txt Crawl-delay directive (crawl delay is specific to Yahoo's slurp / inktomi search bots).
I know I am getting a little off topic now but this information *may* be useful to someone.
I am a metal monkey!
Administrator Community Supporter?
Jedai Sword Master
Gender:
Posts: 7942
40601 credits Members referred : 3
« Reply #8 on: Mar 29, 2006, 02:22:48 PM »
That's very interesting.
About the slurp I don't think that is good to ban it, as yahoo can bring some traffic to your site.
You can use this to your robots.txt to make Slurp treat better to your server :