Dangers of Custom Shortened URLs

by James Slater
Bronco - Digital Marketing Agency

If you’re not careful, custom shortened URLs can be dangerous – just like any other user-provided content.

As an example I’m going to pick on bit.ly. They’re not the only site to have this problem, but since being chosen as the default URL shortener on Twitter they are probably the highest profile.

http://bit.ly/robots.txt

Where do you think that link goes? I’d expect it to go to bit.ly’s robots.txt file, defining any parts of their site they don’t want to be crawled or banning some crawlers altogether. Instead, it redirects to someone’s blog – who, in a weird coincidence, links back to one of our sites in his post.

All the owner of that blog would have to do would be to change his post to look like a normal robots.txt file and he could happily ban Google (or Yahoo, or whoever) from crawling any page on bit.ly. You could probably cause a bit of upset by making a Sitemap entry in there that pointed to your own site… I’m not sure if “Noindex:” works in robots.txt but if so that could also be used for mischief.

How does it even work?

You may think that the custom name ‘robots.txt’ shouldn’t have been allowed – bit.ly do not allow ‘.’ to appear in them. However, they will happily strip out any dots in the link and ignore them – so bit.ly/robots.txt is equivalent to bit.ly/robotstxt. Interestingly, this also shows up a bug somewhere in bit.ly. If you click on both of those links, they should take you to the same blog post…

Bit.ly’s Bug

Sometimes clicking those two links will not take you to the same place.

By comparing the information on the two – robots.txt and robotstxt – you can see that they are actually stored separately in their database.

robots.txtrobotstxt

You may also notice where the first one is supposed to redirect to – I’m clearly not the first one to think of it! As I said before, bit.ly remove dots from the URL before they redirect you – but it seems this doesn’t always happen. By comparing two sets of data you could hazard a guess that around a third of the time, trying to fetch the bit.ly robots.txt will redirect you to the first URL (1) while the rest of the time it will send you to the second (2).

What on earth is going on? My guess is that they are doing something like load balancing over at bit.ly and one of their servers isn’t removing dots in the same manner as the others. It’s redirecting to the one in the database with the dot, which I can only assume was added before they put a check in to prevent it. Their load balancer hits that particular internal server for some percentage of requests. Mind you – that’s just a guess, I’d be interested to hear from bit.ly about it.

Example (1):

GET /robots.txt
Host: bit.ly

HTTP/1.1 301 Moved Permanently
Server: nginx/0.7.42
Date: Thu, 24 Sep 2009 09:21:47 GMT
Content-Type: text/html;charset=utf-8
Connection: keep-alive
Content-Length: 131
Location: http://pentabarf.net/bit.ly-robots.txt
Allow: GET, HEAD, POST

This resource has permanently moved to http://pentabarf.net/bit.ly-robots.txt.

Example (2):

GET /robots.txt
Host: bit.ly

HTTP/1.1 301 Moved
Server: nginx/0.7.42
Date: Thu, 24 Sep 2009 09:21:48 GMT
Content-Type: text/html; charset=utf-8
Connection: keep-alive
Location: http://petercoughlin.com/robotstxt-wordpress-plugin/
MIME-Version: 1.0
Content-Length: 314

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML>
<HEAD>
<TITLE>Moved</TITLE>
</HEAD>
<BODY>
<H2>Moved</H2>
<A HREF="http://petercoughlin.com/robotstxt-wordpress-plugin/">The requested URL has moved here.</A>
<P ALIGN=RIGHT><SMALL><I>AOLserver/4.5.1 on http://127.0.0.1:7200</I></SMALL></P>
</BODY>
</HTML>

This post mentions “dangers” and I’ve only covered one problem with one site… having said that this post is long and boring enough already so I think I’ll save the others until tomorrow! For now, the moral of the story is: if you’re running a URL shortener, be careful letting people name their own links!

James

Bronco - Digital Marketing Agency
Making your inbox more interesting
Looking to keep up to date, or find out those things we can’t mention on the blog? Then sign up to our semi-regular newsletter. Don’t worry, we won’t spam you.

18 Comments

Get in Touch

Things are better when they’re made simpler. That’s why the David Naylor blog is now just that; a blog. No sales pages, no contact form - just interesting* info about SEO.

If you’d like to find out more about the Digital Marketing services we do provide then head over to Bronco (our main company website) to get in touch.

Get in Touch Today * Interestingness not guaranteed
Part of the Bronco family