9, January 2009

Bots that slow your server down, any thing you can do? - webmaster forum

 
Webdigity webmaster forums
This forum shares its ad revenue with its members!
[ Home | Help | Search | Forum's Shop | Archive | Login | Register | Webmaster Directory ]
Webdigity Webmaster Forums  >  Web site promotion  >  Search Engine Optimization
Topic: Bots that slow your server down, any thing you can do?
« previous next »
Pages: [1] Print

Author Topic: Bots that slow your server down, any thing you can do?  (Read 1324 times)
Supreme Overlord
***
Gender: Male
Posts: 149
910 credits
Members referred : 0


www.centos.org


« on: Feb 26, 2006, 01:05:46 PM »

I know getting your server indexed in a search engine is a good thing. But is there anything you can do when you have a ton of bots on your different website's and it starts slowing the server down?
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8362
43159 credits
Members referred : 3



« Reply #1 on: Feb 26, 2006, 01:09:22 PM »

I am not sure if you can really do something for this, except maybe using some kind of caching.

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy or twitter Visit through proxy

Last blog : Monetizing Old Posts
Global Moderator
Internet Junkie
*****
Gender: Male
Posts: 1523
6847 credits
Members referred : 8


Gimme all your cookies!!!


« Reply #2 on: Feb 26, 2006, 03:38:52 PM »

Maybe try editting your robots.txt file to limit the bots to certain areas. There is also a meta tag in the head of your html that instructs bots to only visit after a certain period. I think though that the first one would work better than the latter...


Last blog : Site of the Month - August 2007
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8362
43159 credits
Members referred : 3



« Reply #3 on: Feb 26, 2006, 09:18:39 PM »

Yes the revisit-after meta. I use it, but many bots are not follow this rule

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy or twitter Visit through proxy

Last blog : Monetizing Old Posts
Global Moderator
Internet Junkie
*****
Gender: Male
Posts: 1523
6847 credits
Members referred : 8


Gimme all your cookies!!!


« Reply #4 on: Feb 27, 2006, 02:07:52 AM »

I thought they wouldn't...

I sm sure that the bots will slow down eventually.


Last blog : Site of the Month - August 2007
I wish I was an Oscar winner
**
Posts: 90
560 credits
Members referred : 0


« Reply #5 on: Mar 01, 2006, 04:08:19 AM »

Try robots.txt, and if that doesn't work, just ban their IPs
I love Pokemon
*
Gender: Male
Posts: 14
84 credits
Members referred : 0



« Reply #6 on: Mar 01, 2006, 09:17:06 AM »

It very much depends on what you call a 'bot' - many are simply image or script harvesters which server no useful purpose at all it terms of Ranking PR etc.
The worst bot I know of for using up server resources is the Inktomi/Slurp bot used by Yahoo.
Things can be so bad with this bot that it has its own meta function you can use in robots.txt called Crawl-Delay
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8362
43159 credits
Members referred : 3



« Reply #7 on: Mar 01, 2006, 09:55:13 AM »

Thanks for sharing that Guardian. I was aware of this directive.

Here is how it works :

Code:
User-agent: Slurp
Crawl-delay: 10

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy or twitter Visit through proxy

Last blog : Monetizing Old Posts
I love Pokemon
*
Gender: Male
Posts: 14
84 credits
Members referred : 0



« Reply #8 on: Mar 01, 2006, 10:34:48 AM »

Mine is set to 60 as I regularly get between 100 to 200 of them constantly throughout the day.
Now, here is a scary thought.........
Many servers are configured incorreclty and allow the traversal of non-existant directories for example try on a site www.mysite.com/index.php/index.php Visit through proxy
Many sites that will work on.
Yahoo will actually try to spider for non existant url's to check its bot is working. In th cases where these server configurations allow traversal of non existant url's the header response to the bot will be  a 200 (instead of 404) so the non existant url will actually get indexed !!!
Global Moderator
Internet Junkie
*****
Gender: Male
Posts: 1523
6847 credits
Members referred : 8


Gimme all your cookies!!!


« Reply #9 on: Mar 01, 2006, 04:06:39 PM »

that could be a problem especially if you have mod-rewrites pages that are not checked, cause in some cases you may even have any number of pages that have the exact same content and the may think that you are spamming!


Last blog : Site of the Month - August 2007
I love Pokemon
*
Gender: Male
Posts: 14
84 credits
Members referred : 0



« Reply #10 on: Mar 01, 2006, 06:08:55 PM »

Exactly.
As as there is no way to know what url Slurp or other bots may choose to test a server for correct header response, so you cannot exactly 'anticipate' what it is and set up a redirect or redirectmatch.

As the server actualy produces a page, it has content and therefore can be indexed even though the url does not actually exist.
The index.php/index.php is a classic example of a non existant url that I have seen indexed - Google search gives some excellent examples..

I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8362
43159 credits
Members referred : 3



« Reply #11 on: Mar 01, 2006, 06:18:22 PM »

Quote
Google search gives some excellent examples..

650.000 indexed pages. That's a lot I guess.

I have seen that even with pages that have 404 error codes, Slurp is visiting them again and again. I had made a error in a link before about two months, and even now Slurp is still visits that non existed page.

Maybe this is a bug of the Slurp crawler or somthing.

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy or twitter Visit through proxy

Last blog : Monetizing Old Posts
Global Moderator
Internet Junkie
*****
Gender: Male
Posts: 1523
6847 credits
Members referred : 8


Gimme all your cookies!!!


« Reply #12 on: Mar 01, 2006, 06:24:56 PM »

I have added a perminant redirect to my pages that are incorrectly indexed or moved pages, the they have been visiting them for months still... don't know why they are waisting their time?


Last blog : Site of the Month - August 2007
I love Pokemon
*
Gender: Male
Posts: 14
84 credits
Members referred : 0



« Reply #13 on: Mar 01, 2006, 08:52:03 PM »

I have added a perminant redirect to my pages that are incorrectly indexed or moved pages, the they have been visiting them for months still... don't know why they are waisting their time?
If you are redirecting, then the bot(s) see it as a valid page as the header response is 200
If you see a bot crawling an invalid url by examining your server logs etc, you should redirect to a 404 error page so the bot recieves the correct header response and will eventually drop the url from the indexed cache.
Of course, sometimes it is preferable to redirect to a valid page but eventually you end up in a situation where a bot may thnik you are spamming due to duplicate content - this is a nightmare lol
Cyberpunk Wannabe
*
Posts: 38
256 credits
Members referred : 0



« Reply #14 on: Mar 01, 2006, 09:01:05 PM »

I don't think that having duplicate content is a really big problem for the search engines.

If you have duplicate pages, they just don't get indexed.

MODs : I think this post should be in the SEO category.
I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 8362
43159 credits
Members referred : 3



« Reply #15 on: Mar 01, 2006, 09:02:51 PM »

Topic moved. Thanks Smiley

As for the duplicate content, I think that you may be right but I can't be sure about this at this moment.

Trial and Error my two best teachers Cool
Join us @ facebook Visit through proxy or twitter Visit through proxy

Last blog : Monetizing Old Posts
World Wide Whale
***
Gender: Female
Posts: 151
660 credits
Members referred : 0



« Reply #16 on: Mar 01, 2006, 09:22:48 PM »

You can get uh, reprimanded or whatever by Google if you have duplicate content.

Anyway, I don't really have a problem with bots, so I can't help. My server's big and fast enough to handle it.

~Crystal
Global Moderator
Internet Junkie
*****
Gender: Male
Posts: 1523
6847 credits
Members referred : 8


Gimme all your cookies!!!


« Reply #17 on: Mar 02, 2006, 02:22:59 AM »

I have added a perminant redirect to my pages that are incorrectly indexed or moved pages, the they have been visiting them for months still... don't know why they are waisting their time?
If you are redirecting, then the bot(s) see it as a valid page as the header response is 200
If you see a bot crawling an invalid url by examining your server logs etc, you should redirect to a 404 error page so the bot recieves the correct header response and will eventually drop the url from the indexed cache.
Of course, sometimes it is preferable to redirect to a valid page but eventually you end up in a situation where a bot may thnik you are spamming due to duplicate content - this is a nightmare lol

I use the following to show the surps that it is permanent:
Code:
Header( "HTTP/1.1 301 Moved Permanently" );

but this is going a bit of topic...


Last blog : Site of the Month - August 2007
Trackback URI for this entry : http://www.webdigity.com/trackback.php?topic=1587
Tags : google html seo meta robots.txt Bookmark this thread : Digg Del.icio.us Dzone more....

Topic sponsors:
Get a permanent link here for $1.99!


Pages: [1] Print 
Webdigity Webmaster Forums  >  Web site promotion  >  Search Engine Optimization
Topic: Bots that slow your server down, any thing you can do?
« previous next »
Jump to:
User Area
Welcome, Guest. Please login or register.
Did you miss your activation email?
Jan 09, 2009, 10:34:33 PM





Login with username, password and session length

Donate to our community, and get a permanent link back to your site!

Donate to our community, and get a permanent link back to your site!


Forum Statistics
Total Posts: 38.666
Total Topics: 7.773
Total Members: 4.662
Tutorials : 56
Resources : 143
Designs : 220
Latest Member: webhostingcento

29 Guests, 4 Users online :

11 users online today:



Readers

Web Design Gallery · Whois Lookup · Pagerank · Tag Browsing · Lo-fi version · Syndication · Webmaster forum history · Advertise
Developed by HumanWorks © 2005 - 2009 Webdigity webmaster community · sublime directory
Webdigity Webmaster Forums | Powered by SMF 1.0.12. © 2001-2005, Lewis Media. All Rights Reserved.