Google ignores Bings robots.txt ?
Today while doing some keyword research in Google I came across a Bing search listing ranking at #11 for a competitive term, which is odd due to the fact that Bing blocks spiders to those pages !!
but when I did a site:bing.com/search
so why is google indexing these pages and should bing just drop the robots.txt file and give google what it wants ?, of course these pages are not truly indexed as a few people on twitter has pointed out to me, but 11,000 + pages from links pointing at those pages that Google is allowed to spider ??
DaveN







Aaranged 1441 days ago
http://www.seoskeptic.com/I’ve seen Google “ignore” robots.txt exclusions on more than one occasioned when the target is heavily linked (e.g., Bing). I have an entirely excluded subdomain that’s a PR5!
Manley 1441 days ago
http://twitter.com/LordManleyRobots.txt is not the REP which excludes indexing.
In fact excluding robots using robots.txt is the worst thing to do if you want to stop it being indexed.
If they had used X-ROBOTS-HEADERS or a meta robots tag then it would a fair point, but since the only example with a snippet is the url not excluded using robots.txt it appears that Google absolutely IS obeying bing’s REP
JohnSly 1441 days ago
http://www.steerpointmarekting.comYou know the drill if Google wants something it gets it!
I do find it funny that it follows the robot text for “doubleclick ad sever” http://ad.doubleclick.net/robots.txt .
Maybe it just a Bing thing!
Lee Colbran 1441 days ago
http://www.leecolbran.co.uk/Are you still digging Bing Dave?
Luke Eales 1441 days ago
http://www.twitter.com/lukeealesYep – Bing has used Robots.txt to prevent Googlebot from reading the contents of their search result pages, hence there being no titles or descriptions in your screenshot. Bing should use meta noindex if they want to keep their search results out of the index.
auskeo 1441 days ago
http://www.aukseo.comI agreew with manley that google is obaying the rules but in this case the should filter the pages out of the search results as ranking another search engines result at number 11 for a key phrase isn’t a great result for a searcher on Google.
Simlar thing happened with wolfram alpha results in google – http://www.aukseo.com/wolfman-alpha-results-appearing-in-google-results-462/
Google ignores Bings robots.txt | feed hat blog 1441 days ago
[...] Read more here: Google ignores Bings robots.txt [...]
phaithful 1441 days ago
http://protofilter.comGoogle’s stated in the past that they will include URLs in their index if others link to them and they are blocked by robots.txt.
As Manley pointed out, the fact that there’s just the URL and no description is typical of this behavior.
The only way to exclude these URLs from displaying in the index is to tell Google to NOINDEX. Of course, Google has to be able to access & read the page to get the header or meta instructions.
Google ignores Bings robots.txt ? | seo cloak 1441 days ago
[...] more here: Google ignores Bings robots.txt ? This entry was posted on Friday, June 12th, 2009 at 9:01 am and is filed under davidnaylor. [...]
Lou_geek 1441 days ago
http://lou-geek.blogspot.comApparently even with and Google can still find your pages. The way to avoid them being indexed is using the new rel=canonical tag.
Manley 1441 days ago
http://twitter.com/LordManley@auskeo – But then Google and Wolfram have a special relationship – Sergey Brin was an intern there.
@Lou_geek the canonical link element is not going to work as an REP. Yes, you can suggest the exclusion of a duplicate folder or sub domain, but it does not work across domains and there has to be a canonical page. Even if all the /search?q= pages had a canonical link element pointing back at /search, there very little hope that the context of the pages linking into or the content of those pages will be similar enough for Google to consider that the webmaster included indicator is accurate enough to apply it.
Google News and Blog posts as of June 13, 2009 | BGTip.co.uk 1440 days ago
[...] to Automatically Extract Excerpts From Articles – Some notes on cleaning up web page content Google ignores Bings robots.txt – davidnaylor.co.uk 06/12/2009 today while do some keyword research in Google I came across a bing [...]
David 1439 days ago
http://www.tomisimo.org/Another possibility is that Bing just now noticed Google indexing all those pages and threw up the robots.txt
Arturo 1438 days ago
This is what Google calls “an uncrawled reference”
If the pages were also in DMOZ (which they obviously aren’t) then Google would even show a snippet based on that data. Making a perfect search result and nobody would know the difference
Milan Kryl 1437 days ago
http://TypicalGooglebehaviourYou can’t disable Google index pages only with robots.txt – you have to remove all links to page too. It you don’t you can found some external links in Google SERP.
User is on the first place for Google – so user gets the page if place right query
Alex 1432 days ago
http://curepages.comVery innovative! thanks for sharing the tips which will prove to be very helpful in future as well.
Thanks
SEO Best Practices: SEOmoz’s New Policies Based on Updated Correlation Data | Richteller.com - Instant and Fast Earning, Make Money Online Tips 1431 days ago
[...] search engine crawlers from visiting a web page but they do not keep them from being indexed (see DaveN’s recent post on this topic). They also create a black hole for link juice (as the engines cannot crawl these pages to see any [...]
Jaroslaw 887 days ago
http://www.facemetin.pl/how to block robots bings? in htaccess ?