Blog

Why is mi.o2.ie Duplicating Our Content?

by

Like most things that happen here in the Bronco office, it all started with a SERP:mi.o2.ie SERP

What’s going on there then? A quick investigation of mi.o2.ie shows some badly formatted HTML and an error message about O2′s mobile internet service. Predictably, the domain is listed as belonging to O2 Ireland. So why have they got a scraped version of one of our blog posts? There’s nothing particularly remarkable about that post (sorry Dave) over the others – so what happens if you change the URL?

I’ll cut a long story short and just explain the results – http://mi.o2.ie/www.anywhere.com/somepage will work and return the contents of www.anywhere.com/somepage. It’s even helpful enough to alter redirects (try http://mi.o2.ie/www.google.com) and rewrite any links on the page to keep you within the mi.o2.ie structure. Search engine crawlers must love that.

Worst still: it’s a fairly effective anonymous proxy. It sends a HTTP_VIA header along to the remote server containing “1.1 mi.o2.ie” but contains no reference at all to your own IP address. Browse with this and no-one will know who you are. As long as you turn off images. And Javascript. And maybe CSS.

Ok, so maybe it’s not that useful. I wouldn’t be surprised if Google get a fair few PageRank requests through from O2 Ireland though. :)

At the time of writing, site:mi.o2.ie on Google returns “about 573,000″ results. Ouch. Does anyone know how long this has been running for?

25 Comments

  • Gavin 1199 days ago

    OpenDNS is also reporting those urls as Phishing Sites.

    Reply
  • Paul Carpenter 1199 days ago

    http://www.itsafamilything.co.uk

    “It started with a SERP” sounds like the worst song title in history. Good find though.

    Reply
  • Dave 1198 days ago

    http://www.djb31st.co.uk

    O2 appear to have a long history of plain WRONG online apps to support mobile. When they first released the method of fetching MMS messages online, you could simply change the hash at the end of the URL and view anyone’s MSS (i bet you can imagine the content!)

    Wondering if this can be blocked server side?

    Reply
  • James 1198 days ago

    Paul: I look forward to seeing it on your next album, then.

    Reply
  • PaulH 1198 days ago

    Nice proxy :)

    Even crawls itself
    http://mi.o2.ie/mi.o2.ie/mi.o2.ie/mi.o2.ie

    Wonder how much of davidnaylor.co.uk we get in there
    spam

    Reply
  • pancallok 1197 days ago

    http://mi.o2.ie/pancallok.blogspot.com

    I’m having this issue too, they already taking up our site serp. What can we do to fight this spam?

    Reply
  • g1smd 1196 days ago

    I have seen a similar thing with o2 UK in recent weeks.

    In Google Analytics, I always set up a profile that records ‘visits that are NOT for the domain belonging to the real site’. That’s achieved with an ‘exclude’ custom filter.

    This profile therefore shows page views for searchengine cache pages, and for scraper sites. It can be very revealing.

    I have seen a visitor recorded as having viewed csp.o2.co.uk/www.example.com/thispage.html in recent weeks. That user used a mobile device with the SymbianOS and Safari web browser.

    Unlike the Irish example, csp.o2.co.uk/www.somesite.com/somepage.html does not seem to be accessible from the outside world – from outside O2′s ecosystem that is.

    Reply
  • g1smd 1192 days ago

    So, having discussed this on Twitter over the weekend, today http://twitter.com/O2forum briefly popped up and asked what the problem was.

    I have directed them here, and await their response.

    Incidentally, a request for http://mi.o2.ie/robots.txt returns a 302 redirect to http://mi.o2.ie/http://mi.o2.ie/Error.jsp?url=http%3A%2F%2Frobots.txt because they have no robots.txt file installed.

    No sign of any response from o2 UK though.

    Reply
  • James 1191 days ago

    Thanks for pointing me to that conversation, g1smd (if that is indeed your real name!)

    Interestingly the “site:mi.o2.ie” query mentioned above now seems to only return one result. I wonder if it’s been blocked somehow? I don’t see any changes that would have caused that.

    Reply
  • Justin Meighan 1191 days ago

    http://www.justinmeighan.com

    Badly formatted, it’s valid WML 1.1!
    Cut them some slack man :)

    Reply
  • g1smd 1191 days ago

    I care nothing for it being valid ‘anything’, because that is not the issue.

    The whole point is that O2 Ireland’s site created duplicate copies of hundreds of thousands of sites, each with URLs that appeared to be part of O2′s domain, and then also allowed that content to be indexed by Google. Additionally, those copied pages had their internal navigation altered to point to O2′s domain, not to the original site.

    Whilst they might have at least temporarily fixed the issue of Google indexing the copied content, they have not fixed the problem of these accesses messing with the site stats and analytics of the sites that have been copied – and since there are no robot.txt exclusions or firewall rules, the content could so easily still be indexed by other searchengines and the problem be repeated all over again.

    A similar problem also exists with O2 UK, where accesses to the copies skew the site stats of the sites that have been copied or proxied. So far, the copies have not been indexed by searchengines and I hope it stays that way.

    Reply
  • g1smd 1179 days ago

    Have they no-one with a clue running their sites?

    Requesting http://mi.o2.ie/robots.txt redirects the visitor to http://www.robots.txt/ now.

    Reply
  • g1smd 1175 days ago

    They are still tinkering.

    Requesting http://mi.o2.ie/robots.txt now redirects the visitor to http://robots.txt/ instead.

    Get someone with a clue for &$%£*@ sake!

    Reply
  • g1smd 1173 days ago

    Now we’re back to requests for http://mi.o2.ie/robots.txt being redirected to http://www.robots.txt/ again.

    Pathetic!

    Reply
  • g1smd 1169 days ago

    Someone with half a clue appears to now be in charge.

    It’s vaguely better than someone with no clue at all.

    So, today, a request for http://mi.o2.ie/robots.txt redirects to…

    Wait for it.

    You’re gonna like this one.

    Actually, you’re not.

    Ta da…

    http://advancedsearch.virginmedia.com/subscribers/assist?url=www.robots.txt

    What the f*ck are they thinking?

    Reply
  • g1smd 1168 days ago

    Now we’re back to requests for http://mi.o2.ie/robots.txt being redirected to http://www.robots.txt/ again.

    It’s a FARCE.

    Reply
  • g1smd 1158 days ago

    Requests for http://mi.o2.ie/robots.txt are still incorrectly redirected to http://www.robots.txt/ as before.

    Reply
  • g1smd 1134 days ago

    Still not fixed!

    Reply
  • g1smd 1122 days ago

    Beyond a joke now!

    Reply
  • g1smd 1114 days ago

    Another new month and O2 still haven’t fixed their website.

    Reply
  • g1smd 1071 days ago

    It is now way more than four months since O2 originally made this error.

    …and it’s still not fixed.

    Reply
  • g1smd 1039 days ago

    Five months!

    Still not fixed.

    Reply
  • g1smd 1012 days ago

    Six months!

    Reply
  • g1smd 851 days ago

    Almost a year later and requests for http://mi.o2.ie/robots.txt redirect to http://robots.txt/ now.

    Clearly there’s no-one at o2 with anything even resembling the vaguest sort of clue.

    Reply
  • g1smd 647 days ago

    So, 18 months after O2 were first informed of the problems, where are we now?

    No surprises. NOT FIXED.

    http://mi.o2.ie/robots.txt returns “404 Not Found” status and matching HTML error page.

    Reply

Write your comment

Optional

The Bronco Family
Work With Us