Sublime directory Surf the web anonymous Pagerank Monitor


How can I find out if a site's links are spiderable?

vbignacio
Tue 8 May 2007, 12:26 pm GMT +0300
is there a site or a tool that can tell me if a site has spiderable links or not?

Nikolas
Tue 8 May 2007, 12:28 pm GMT +0300
You don't need a tool for that. If a link does not have the rel="nofollow" attribute it is spiderable.

ventureskills
Tue 8 May 2007, 12:34 pm GMT +0300
unless its generated by javascript of course ;)

vbignacio
Tue 8 May 2007, 12:42 pm GMT +0300
Code:
If a link does not have the rel="nofollow" attribute it is spiderable.

so i will just check the source code of the site?

Nikolas
Tue 8 May 2007, 12:53 pm GMT +0300
Yeah that's enough. If the page you want to check is for some kind of link exchange be sure that this page is linked from the rest of the site (I am getting a lot of link exchange requests from scammers that use non-indexed pages to put their links :) )

vbignacio
Tue 8 May 2007, 12:58 pm GMT +0300
how would i know if the link is in a directory disallowed in the robot.txt file? even thou it is not indicated in the actual page, will the links there also not spidered?

Nikolas
Tue 8 May 2007, 01:27 pm GMT +0300
If the page is linked from somewhere in the site you wont have any problem.

Otherwise you can check it in the site:www...... query in google.

ventureskills
Tue 8 May 2007, 01:40 pm GMT +0300
www.mysite.com/robots.txt Visit through proxy - however if the page is indexed then look for no-follows javascript generator androbots-nocontent which is Yahoo new on page stopper
http://ventureskills.wordpress.com/2007/05/02/microformats-for-search-a-real-possibility/ Visit through proxy

for more info

tritrain
Mon 17 March 2008, 02:31 am GMT +0200
Giorgos pointed out in a blog post how there are a variety of ways to limit the crawling of spiders.  Here's the link to his post - http://geoland.org/2007/11/fake-links-nofollow-is-just-the-beginning Visit through proxy

You'd have to look at their robots.txt and/or htaccess.  In addition, they could filter it using javascript to look like a link that would pass juice.

As to a handy way of seeing normal nofollow links, I'd suggest using SEOQuake and set it to line through the nofollow links.  It's available now for IE and Firefox.

Otherwise use the Search Status for Firefox.  It will highlight the nofollow link (annoying to me).

yoonoo
Fri 13 June 2008, 11:55 am GMT +0300
Code:
If a link does not have the rel="nofollow" attribute it is spiderable.

so i will just check the source code of the site?

Also view the robots.txt of the site. Maybe they disallow the spider to crawl that page/folder.

Archive for SMF v1.00 by N.P. Valid XHTML 1.0 Transitional