Like most things that happen here in the Bronco office, it all started with a SERP:mi.o2.ie SERP

What’s going on there then? A quick investigation of mi.o2.ie shows some badly formatted HTML and an error message about O2′s mobile internet service. Predictably, the domain is listed as belonging to O2 Ireland. So why have they got a scraped version of one of our blog posts? There’s nothing particularly remarkable about that post (sorry Dave) over the others – so what happens if you change the URL?

I’ll cut a long story short and just explain the results – http://mi.o2.ie/www.anywhere.com/somepage will work and return the contents of www.anywhere.com/somepage. It’s even helpful enough to alter redirects (try http://mi.o2.ie/www.google.com) and rewrite any links on the page to keep you within the mi.o2.ie structure. Search engine crawlers must love that.

Worst still: it’s a fairly effective anonymous proxy. It sends a HTTP_VIA header along to the remote server containing “1.1 mi.o2.ie” but contains no reference at all to your own IP address. Browse with this and no-one will know who you are. As long as you turn off images. And Javascript. And maybe CSS.

Ok, so maybe it’s not that useful. I wouldn’t be surprised if Google get a fair few PageRank requests through from O2 Ireland though. :)

At the time of writing, site:mi.o2.ie on Google returns “about 573,000″ results. Ouch. Does anyone know how long this has been running for?

James

22 Comments

  • 1

    OpenDNS is also reporting those urls as Phishing Sites.

    Gavin

    8th March 2010 @ 17:15

  • 2

    “It started with a SERP” sounds like the worst song title in history. Good find though.

    Paul Carpenter | http://www.itsafamilything.co.uk

    8th March 2010 @ 21:05

  • 3

    O2 appear to have a long history of plain WRONG online apps to support mobile. When they first released the method of fetching MMS messages online, you could simply change the hash at the end of the URL and view anyone’s MSS (i bet you can imagine the content!)

    Wondering if this can be blocked server side?

    Dave | http://www.djb31st.co.uk

    9th March 2010 @ 06:23

  • 4

    Paul: I look forward to seeing it on your next album, then.

    James

    9th March 2010 @ 09:06

  • 5

    Nice proxy :)

    Even crawls itself
    http://mi.o2.ie/mi.o2.ie/mi.o2.ie/mi.o2.ie

    Wonder how much of davidnaylor.co.uk we get in there
    spam

    PaulH

    9th March 2010 @ 09:26

  • 6

    I’m having this issue too, they already taking up our site serp. What can we do to fight this spam?

    pancallok | http://mi.o2.ie/pancallok.blogspot.com

    10th March 2010 @ 05:19

  • 7

    I have seen a similar thing with o2 UK in recent weeks.

    In Google Analytics, I always set up a profile that records ‘visits that are NOT for the domain belonging to the real site’. That’s achieved with an ‘exclude’ custom filter.

    This profile therefore shows page views for searchengine cache pages, and for scraper sites. It can be very revealing.

    I have seen a visitor recorded as having viewed csp.o2.co.uk/www.example.com/thispage.html in recent weeks. That user used a mobile device with the SymbianOS and Safari web browser.

    Unlike the Irish example, csp.o2.co.uk/www.somesite.com/somepage.html does not seem to be accessible from the outside world – from outside O2′s ecosystem that is.

    g1smd

    12th March 2010 @ 00:18

  • 8

    So, having discussed this on Twitter over the weekend, today http://twitter.com/O2forum briefly popped up and asked what the problem was.

    I have directed them here, and await their response.

    Incidentally, a request for http://mi.o2.ie/robots.txt returns a 302 redirect to http://mi.o2.ie/http://mi.o2.ie/Error.jsp?url=http%3A%2F%2Frobots.txt because they have no robots.txt file installed.

    No sign of any response from o2 UK though.

    g1smd

    15th March 2010 @ 17:57

  • 9

    Thanks for pointing me to that conversation, g1smd (if that is indeed your real name!)

    Interestingly the “site:mi.o2.ie” query mentioned above now seems to only return one result. I wonder if it’s been blocked somehow? I don’t see any changes that would have caused that.

    James

    16th March 2010 @ 09:03

  • 10

    Badly formatted, it’s valid WML 1.1!
    Cut them some slack man :)

    Justin Meighan | http://www.justinmeighan.com

    16th March 2010 @ 09:09

  • 11

    I care nothing for it being valid ‘anything’, because that is not the issue.

    The whole point is that O2 Ireland’s site created duplicate copies of hundreds of thousands of sites, each with URLs that appeared to be part of O2′s domain, and then also allowed that content to be indexed by Google. Additionally, those copied pages had their internal navigation altered to point to O2′s domain, not to the original site.

    Whilst they might have at least temporarily fixed the issue of Google indexing the copied content, they have not fixed the problem of these accesses messing with the site stats and analytics of the sites that have been copied – and since there are no robot.txt exclusions or firewall rules, the content could so easily still be indexed by other searchengines and the problem be repeated all over again.

    A similar problem also exists with O2 UK, where accesses to the copies skew the site stats of the sites that have been copied or proxied. So far, the copies have not been indexed by searchengines and I hope it stays that way.

    g1smd

    17th March 2010 @ 00:30

  • 12

    Have they no-one with a clue running their sites?

    Requesting http://mi.o2.ie/robots.txt redirects the visitor to http://www.robots.txt/ now.

    g1smd

    28th March 2010 @ 17:37

  • 13

    They are still tinkering.

    Requesting http://mi.o2.ie/robots.txt now redirects the visitor to http://robots.txt/ instead.

    Get someone with a clue for &$%£*@ sake!

    g1smd

    1st April 2010 @ 11:42

  • 14

    Now we’re back to requests for http://mi.o2.ie/robots.txt being redirected to http://www.robots.txt/ again.

    Pathetic!

    g1smd

    3rd April 2010 @ 18:49

  • 15

    Someone with half a clue appears to now be in charge.

    It’s vaguely better than someone with no clue at all.

    So, today, a request for http://mi.o2.ie/robots.txt redirects to…

    Wait for it.

    You’re gonna like this one.

    Actually, you’re not.

    Ta da…

    http://advancedsearch.virginmedia.com/subscribers/assist?url=www.robots.txt

    What the f*ck are they thinking?

    g1smd

    7th April 2010 @ 14:37

  • 16

    Now we’re back to requests for http://mi.o2.ie/robots.txt being redirected to http://www.robots.txt/ again.

    It’s a FARCE.

    g1smd

    8th April 2010 @ 16:23

  • 17

    Requests for http://mi.o2.ie/robots.txt are still incorrectly redirected to http://www.robots.txt/ as before.

    g1smd

    18th April 2010 @ 20:19

  • 18

    Still not fixed!

    g1smd

    12th May 2010 @ 13:51

  • 19

    Beyond a joke now!

    g1smd

    24th May 2010 @ 15:13

  • 20

    Another new month and O2 still haven’t fixed their website.

    g1smd

    2nd June 2010 @ 00:39

  • 21

    It is now way more than four months since O2 originally made this error.

    …and it’s still not fixed.

    g1smd

    14th July 2010 @ 09:20

  • 22

    Five months!

    Still not fixed.

    g1smd

    15th August 2010 @ 17:27

Write a Comment

*

*

*

a4u expo London 2010 Pro SEO Seminar
Subscribe
to the David Naylor feed
Follow
David Naylor's Twitter feed

View Dave's Blog