Google ignores Bings robots.txt ?
Today while doing some keyword research in Google I came across a Bing search listing ranking at #11 for a competitive term, which is odd due to the fact that Bing blocks spiders to those pages !!

but when I did a site:bing.com/search

so why is google indexing these pages and should bing just drop the robots.txt file and give google what it wants ?, of course these pages are not truly indexed as a few people on twitter has pointed out to me, but 11,000 + pages from links pointing at those pages that Google is allowed to spider ??
DaveN
18 Comments
Aaranged - http://www.seoskeptic.com/
I’ve seen Google “ignore” robots.txt exclusions on more than one occasioned when the target is heavily linked (e.g., Bing). I have an entirely excluded subdomain that’s a PR5!
Manley - http://twitter.com/LordManley
Robots.txt is not the REP which excludes indexing.
In fact excluding robots using robots.txt is the worst thing to do if you want to stop it being indexed.
If they had used X-ROBOTS-HEADERS or a meta robots tag then it would a fair point, but since the only example with a snippet is the url not excluded using robots.txt it appears that Google absolutely IS obeying bing’s REP
JohnSly - http://www.steerpointmarekting.com
You know the drill if Google wants something it gets it!
I do find it funny that it follows the robot text for “doubleclick ad sever” http://ad.doubleclick.net/robots.txt .
Maybe it just a Bing thing!
Lee Colbran - http://www.leecolbran.co.uk/
Are you still digging Bing Dave?
Luke Eales - http://www.twitter.com/lukeeales
Yep – Bing has used Robots.txt to prevent Googlebot from reading the contents of their search result pages, hence there being no titles or descriptions in your screenshot. Bing should use meta noindex if they want to keep their search results out of the index.
auskeo - http://www.aukseo.com
I agreew with manley that google is obaying the rules but in this case the should filter the pages out of the search results as ranking another search engines result at number 11 for a key phrase isn’t a great result for a searcher on Google.
Simlar thing happened with wolfram alpha results in google – http://www.aukseo.com/wolfman-alpha-results-appearing-in-google-results-462/
Google ignores Bings robots.txt | feed hat blog - pingback
[…] Read more here: Google ignores Bings robots.txt […]
phaithful - http://protofilter.com
Google’s stated in the past that they will include URLs in their index if others link to them and they are blocked by robots.txt.
As Manley pointed out, the fact that there’s just the URL and no description is typical of this behavior.
The only way to exclude these URLs from displaying in the index is to tell Google to NOINDEX. Of course, Google has to be able to access & read the page to get the header or meta instructions.
Google ignores Bings robots.txt ? | seo cloak - pingback
[…] more here: Google ignores Bings robots.txt ? This entry was posted on Friday, June 12th, 2009 at 9:01 am and is filed under davidnaylor. […]
Lou_geek - http://lou-geek.blogspot.com
Apparently even with and Google can still find your pages. The way to avoid them being indexed is using the new rel=canonical tag.
Manley - http://twitter.com/LordManley
@auskeo – But then Google and Wolfram have a special relationship – Sergey Brin was an intern there.
@Lou_geek the canonical link element is not going to work as an REP. Yes, you can suggest the exclusion of a duplicate folder or sub domain, but it does not work across domains and there has to be a canonical page. Even if all the /search?q= pages had a canonical link element pointing back at /search, there very little hope that the context of the pages linking into or the content of those pages will be similar enough for Google to consider that the webmaster included indicator is accurate enough to apply it.
Google News and Blog posts as of June 13, 2009 | BGTip.co.uk - pingback
[…] to Automatically Extract Excerpts From Articles – Some notes on cleaning up web page content Google ignores Bings robots.txt – davidnaylor.co.uk 06/12/2009 today while do some keyword research in Google I came across a bing […]
David - http://www.tomisimo.org/
Another possibility is that Bing just now noticed Google indexing all those pages and threw up the robots.txt
Arturo
This is what Google calls “an uncrawled reference” 🙂
If the pages were also in DMOZ (which they obviously aren’t) then Google would even show a snippet based on that data. Making a perfect search result and nobody would know the difference 😉
Milan Kryl - http://TypicalGooglebehaviour
You can’t disable Google index pages only with robots.txt – you have to remove all links to page too. It you don’t you can found some external links in Google SERP.
User is on the first place for Google – so user gets the page if place right query 🙂
Alex - http://curepages.com
Very innovative! thanks for sharing the tips which will prove to be very helpful in future as well.
Thanks
SEO Best Practices: SEOmoz’s New Policies Based on Updated Correlation Data | Richteller.com - Instant and Fast Earning, Make Money Online Tips - pingback
[…] search engine crawlers from visiting a web page but they do not keep them from being indexed (see DaveN’s recent post on this topic). They also create a black hole for link juice (as the engines cannot crawl these pages to see any […]
Jaroslaw - http://www.facemetin.pl/
how to block robots bings? in htaccess ?