28, May 2012

Majestic-12 a massive crawler - webmaster forum

 
Webdigity webmaster forums
[ Home | Help | Search | Forum's Shop | Archive | Login | Register | Webmaster Directory ]
Webdigity Webmaster Forums  >  Design and Layout  >  General webmaster discussions (Moderator: Meth0d)
Topic: Majestic-12 a massive crawler
« previous next »
Pages: [1] Print
Instabuck - The easy way to sell digital products online

Author Topic: Majestic-12 a massive crawler  (Read 3128 times)
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6691
34714 credits
Members referred : 374


It's time to use PHP5!


« on: Sep 11, 2006, 08:27:20 am »

Hello,

just checked the stats from a website: and found 50.000 pageviews generated withing 6 weeks by this bot:
http://www.majestic12.co.uk/projects/dsearch/mj12bot.php

thats the same number then visitors on the website...
thank you majestic12...

I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #1 on: Sep 11, 2006, 09:12:56 am »

Maybe you need this robots.txt attribute :

Code:
User-Agent: MJ12bot
Crawl-Delay:   5

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6691
34714 credits
Members referred : 374


It's time to use PHP5!


« Reply #2 on: Sep 11, 2006, 09:48:08 am »

Maybe you need this robots.txt attribute :

Code:
User-Agent: MJ12bot
Crawl-Delay:   5

I used this one:

Code:
User-Agent: MJ12bot
Disallow:   /

this is a strange world with these bot's, We (website owners have to aks a search engine please spider me) but there are others stealing 100 of MB's from my bandwidth without asking...

I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #3 on: Sep 11, 2006, 10:13:47 am »

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6691
34714 credits
Members referred : 374


It's time to use PHP5!


« Reply #4 on: Sep 11, 2006, 10:17:07 am »

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *

that's a nice one got this info from their website...

Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6691
34714 credits
Members referred : 374


It's time to use PHP5!


« Reply #5 on: Nov 27, 2006, 11:05:40 am »

I used this robots file: http://www.bergtoys.com/robots.txt

with the result of 25.000 crawls since the last post here in september.

whats wrong with them?

I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #6 on: Nov 27, 2006, 12:13:57 pm »

You mean what is wrong with your robots.txt?

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6691
34714 credits
Members referred : 374


It's time to use PHP5!


« Reply #7 on: Nov 27, 2006, 12:32:54 pm »

You mean what is wrong with your robots.txt?
with my file or the bot
(I changed the file today back to use the slash, to check if this is the problem)

I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #8 on: Nov 27, 2006, 12:50:36 pm »

Use this tool to check your rules :

http://tool.motoricerca.info/robots-checker.phtml

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6691
34714 credits
Members referred : 374


It's time to use PHP5!


« Reply #9 on: Nov 27, 2006, 12:54:08 pm »

than is this wrong: Wink

Hehe that's true.

BTW the Disallow:   / wont help you much.

Maybe you should try Disallow:   *


The following block of code contains some errors. Please, remove all the reported errors and check again this robots.txt file.
Line 16    User-agent: ShopWiki
Line 17    Disallow: *
We advise you to start a file/directory name with a leading slash char (Example: /private.html).
Line 18    

The following block of code DISALLOWS the crawling of all files and directories to the following spiders/robots: MJ12bot
Line 19    User-agent: MJ12bot
Line 20    Disallow: /

I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #10 on: Nov 27, 2006, 01:04:34 pm »

Maybe the correct would be /* ?

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
Global Moderator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 6691
34714 credits
Members referred : 374


It's time to use PHP5!


« Reply #11 on: Nov 27, 2006, 01:07:55 pm »

Maybe the correct would be /* ?

I don't know this is the information from the tool:


The following block of code DISALLOWS the crawling of all files and directories to the following spiders/robots: MJ12bot

I am a metal monkey!
Administrator
Community Supporter ?
Jedai Sword Master
*****
Gender: Male
Posts: 5799
46391 credits
Members referred : 3



« Reply #12 on: Nov 27, 2006, 01:10:54 pm »

Oh right, I thought it was meant to be an error, but it is ok.

Trial and Error my two best teachers Cool
Join us @ facebook or twitter

Last blog : Butterfly Marketing 2.0
Trackback URI for this entry : http://www.webdigity.com/trackback.php?topic=4031
Tags : directories robots.txt search engines Bookmark this thread : Digg Del.icio.us Dzone more....

Pages: [1] Print 
Webdigity Webmaster Forums  >  Design and Layout  >  General webmaster discussions (Moderator: Meth0d)
Topic: Majestic-12 a massive crawler
« previous next »
Jump to:
User Area
Welcome, Guest. Please login or register.
Did you miss your activation email?
May 28, 2012, 04:24:49 pm





Login with username, password and session length

Donate to our community, and get a permanent link back to your site!

Donate to our community, and get a permanent link back to your site!






Web Design Gallery · Whois Lookup · Pagerank · Tag Browsing · Lo-fi version · Syndication · Webmaster forum history · Advertise
Developed by HumanWorks © 2005 - 2012 Webdigity webmaster community · sublime directory
Webdigity Webmaster Forums | Powered by SMF 1.0.12. © 2001-2005, Lewis Media. All Rights Reserved.