Another day, another Google privacy violation
Remember the new sitestats section of Google Sitemaps? In a couple of minutes we’ve found quite a dodgy exploit which sometimes allows you to see the stats of your less web-savvy competitors.
Google requires you to verify that a site is yours by placing a file with a random filename in the root of your sites. However, if you (badly) employ custom 404 messages on your server, you may have instructed your server, inadvertantly, to declare all URLs within your domain as found.
It all depends on the actual server headers found and the way Google interperates them. From our little foray, we’ve concluded:
Not Found:
404 (obviously)
301 and 302 Moved Permanently/Temporarily
Found:
200 – all of
All other 302s (When redirecting to, say, /404.html)
Check out the screenshot of sites we own 
And some stats from uk.php.net
So who is at fault for all of this?
Well, we reckon mostly Google for not properly thinking through the whole verificaiton process. All the sites we managed to “0wn” would clearly be 404′s to a well-thought-out system. However, webmasters are also partly to blame for bad server setups – eg, we got Ebay.com because they had misspelled “Permanently” in their 301 header. There are also lots of spammy directories out there which return 200 OK for pretty much anything ending in .html
But perhaps the real question should be: How much do you want to trust Google with your data when they get caught making mistakes such as this? This kind of data generally isn’t too sensitive, but imagine if we put a competitor’s site in there? At very least we’d be able to know exactly what keywords to target.
So go to google sitemaps .. and add aol.com to your account see what happens .. i bet you get their stats 
Post Script:
A couple of things we’ve found since playing with this:
- 23% of the Alex Top 100 sites are susceptible to this problem
- Other big sites include Orkut, Infoseek Japan, Match.com, Business.com and Whitehouse.gov
- SEW ranks for singingfish
- Most sites we added were using 301 or 302 to another file. We noticed if your 302 or 301 crosses a domain, the site was added
- MSN servers return 200 OK but also “STATUS_CODE: NotFound”, which Google fell over on (“temporary problem”)
- Monster.com should be susceptible but Google’s servers couldn’t resolve it.
And one final little titbit from the Sitemaps FAQ:
8. What is being done to protect my privacy?
We use the verification process to keep unauthorized users from seeing detailed statistics about your site. Only you can see these details, and only once we verify you own the site. We don’t use the verification file we ask you to create for any purpose other than to make sure you can upload files to the site.
DaveN