I wrote a post yesterday about the dangers of allowing (some) custom shortened URLs. Really though, I only talked about one particular name on one particular shortener, so I’ll try to make this a little more general. Incidentally, bit.ly appear to have fixed their robots.txt as of a couple of hours ago – they obviously thought it was a problem!
There were a couple of reasons I was pretty sure about Google following the redirect on robots.txt – the main one being Google’s own Webmaster Tools! If you believe their own tool, Google were quite happily following the redirect and trying to parse a blog as a robots.txt. Depending on the random chance I mentioned yesterday, this sometimes came up with the “other” robots.txt.
This brings me neatly onto the next potential problem – verifying your URL shortener in Google Webmaster Tools or Yahoo Site Explorer.
One of the metrics Google Webmaster Tools uses to decide if you have control over a particular site is for you to place an empty file with a specified name on your site. What happens if I try to add bit.ly to my Webmaster Tools account?
It asks me to create http://bit.ly/google4516834ef16f6fa21.html (or thereabouts)
Now, bit.ly won’t let me create a custom name with a dot in, but until this morning they would silently strip them out.
So, one could just make http://bit.ly/google4516834ef16f6fa21html instead (note the lack of a dot). When Google come along and try to verify the existence of the file, they will do so with the dot in there; bit.ly silently dropped it, redirected to somewhere – anywhere, it doesn’t matter – and the account is marked as verified. Considering there are some fairly odd looking removal requests in there, I don’t think I’m the first person to think of it.
When checking of the specially named file, Google also look for a file they are not expecting to find. If it doesn’t return an error code, they reject the verification. This is due to many misconfigured sites (including URL shorteners!) that fail to return a 404 error code for paths that do not exist. This happens to stop the procedure here from working on several other shorteners that I tried, but luckily for the purposes of this article, Yahoo Site Explorer isn’t so picky.
Considering bit.ly seem to have fixed their problems, let’s pick on a different target. How about tr.im? They’re another big player for several reasons, the biggest of which is that now they’re open source there are dozens of clones out there. This works on all of them that I tried.
To verify a URL for Site Explorer, you can create a file in a similar manner to the Google one above.
For tr.im, Yahoo asked me to create: http://tr.im/y_key_a41e58c3dc1e462a.html (or again, close enough!)
They don’t allow dots in their short names either, nor is the box big enough to enter the whole string. Unfortunately, they will ignore any extension added to the URL, so creating a short name of y_key_a41e58c3dc1e462a is all that is required for the requested link to work.
Rather than relying on your 404 reporting to be working, Yahoo require that a particular string appears in the content of the file they request; that way they can be sure it’s not an error page and you have control over the content. In my case this looked like fc140e122451a103. As you will probably have guessed by now, they will follow a redirect before looking for that string. Make the destination URL for your short link point to a page containing that value (like, say, this one!) and Yahoo will verify the site as your own.
No, your URL shortener is not going to get hacked, the world is not going to end, and these particular issues are easy to avoid. Having said that, please think carefully about how your own URLs can be used against you.
Is there any good reason for Google and Yahoo to follow redirects when verifying ownership of a site? Removing this would seem to be a good idea to me. I suspect following redirects on robots.txt is a more complicated issue but perhaps some restrictions could be considered there. Search engines currently have an uneasy relationship with URL shorteners and I think that will only continue.
Thanks for all the comments.