Author Archive

Greasemonkey Scripts

We’ve added a new section to the David Naylor site! We’ve been producing some Greasemonkey scripts recently and we thought you might find them useful. Greasemonkey is a funky little addon that lets you manipulate the page in lots of ways without having to go through the rigmarole of writing a full-blown plugin. It’s fantastic for those “I wish this site was like this” moments.

We’re soliciting suggestions for other cool scripts to go on that page, so if you have any ideas please let us know.

Obviously, you’re going to need a real browser to use them.

WordPress: “Too cool for documentation”

I’m pissed off.

I’m trying to write a sidebar widget for the new WordPress. It’s a really simple widget, in fact it only has one option. Since widgets are part of WordPress now, naturally I thought - as the premier blogging platform, that I’d find some well-written API documentation on the codex detailing how to go about interfacing with the dynamic sidebar and admin section.

HAHAHAHAHAHAHHAHAHA

Yeah I was wrong.

In fact, the best I could come up with was an insultingly nonchalant overview at Automattic’s website along with some kind of excuse for an API.

I defy anyone to write a sidebar widget that works with the built-in widgets UIs using the information on these websites. I reckon you can get close but there’s bits of information missing about the admin panel. Apparently there’s some Google Search widget that you can look at that’s well-commented but I can’t find that anywhere within my WordPress source tree.

Don’t get me wrong, I love the laid-back “cool Web 2.0″ style but there are boundaries. And saying API Documentation is “for eggheads” is a bit childish, honestly.

Followup: mod_rewrite .htaccess file management

Some of you might have read my recent post complaining that rewrite rules in subdirectories cancel out parent rules.

I promised I’d look in to it for a better solution, and I have:

RewriteOptions inherit

Ithankyouverymuch, goodnight.

Top tip: Only use one .htaccess

Disclaimer: I’m trying to write this in under 7 minutes

It’s been a fun day, and we’ve learnt something from this:

Only use one .htaccess file for mod_rewrites.

Example:

  • Say I have my site, path-wise located at /
  • My site also has a shop, located at /shop/

Now, it’s really tempting to have a /.htaccess file containing all my rewrites for my site, and a separate /shop/.htaccess file which holds just the shop-specific rewrites. It’s modular, it’s scalable, it’s clear.

It’s a mistake.

The problem is that your /shop/.htaccess not only overrides /.htaccess for all the /shop/.* URLs, but it actually *disables* all the rewrites for /.htaccess. This is a problem if for example you one day decide to do any mass site-wide redirects, such as domain or protocol canonicalisation. Cleaning up those old URLs can be a problem too.

However, I like modulation so I’m going to do some thinking about this and see what I can come up with as a solution.

Enjoy your weekend!

-Rob

A Clever JavaScript Redirect

Dave just found this clever JS redirect while perusing some quality spam:

[html]

[/html]

That’s a bit wide so here’s that centre block:

60!115!99!114!105!112!116!62!13!10!102!117!110!99!116!105!111!
110!32!82!40!41!123!13!10!118!97!114!32!82!101!102!61!100!111!
99!117!109!101!110!116!46!114!101!102!101!114!114!101!114!59!
13!10!32!13!10!105!102!32!40!82!101!102!46!105!110!100!101!120!
79!102!40!39!46!103!111!111!103!108!101!46!39!41!33!61!45!49!
32!124!124!32!82!101!102!46!105!110!100!101!120!79!102!40!39!
46!109!115!110!46!39!41!33!61!45!49!32!124!124!32!82!101!102!
46!105!110!100!101!120!79!102!40!39!46!121!97!104!111!111!46!
39!41!33!61!45!49!32!124!124!32!82!101!102!46!105!110!100!101!
20!79!102!40!39!46!97!111!108!46!39!41!33!61!45!49!32!124!124!
32!82!101!102!46!105!110!100!101!120!79!102!40!39!46!97!115!
107!46!39!41!33!61!45!49!32!124!124!32!82!101!102!46!105!110!
100!101!120!79!102!40!39!114!101!115!117!108!116!115!39!41!33!
61!45!49!32!124!124!32!82!101!102!46!105!110!100!101!120!79!
102!40!39!115!101!97!114!99!104!39!41!33!61!45!49!32!124!124!
32!82!101!102!46!105!110!100!101!120!79!102!40!39!115!117!99!
104!101!39!41!33!61!45!49!41!13!10!32!123!32!100!111!99!117!
109!101!110!116!46!119!114!105!116!101!40!39!60!115!99!114!
105!112!116!32!108!97!110!103!117!97!103!101!61!34!106!97!
118!97!115!99!114!105!112!116!34!62!119!105!110!100!111!39!
43!39!119!46!108!111!99!97!116!105!111!110!61!34!104!116!116!
112!58!47!47!119!119!119!46!100!97!118!105!100!110!97!121!
108!111!114!46!99!111!46!117!107!34!60!47!115!39!43!39!99!
114!105!112!116!62!39!41!125!13!10!13!10!101!108!115!101!32!
123!13!10!100!111!99!117!109!101!110!116!46!119!114!105!116!
101!40!39!60!115!99!114!105!112!116!32!108!97!110!103!117!
97!103!101!61!34!106!97!118!97!115!99!114!105!112!116!34!62!
119!105!110!100!111!39!43!39!119!46!108!111!99!97!116!105!
111!110!61!34!104!116!116!112!58!47!47!119!119!119!46!103!
111!111!103!108!101!46!99!111!109!34!60!47!115!39!43!39!99!
114!105!112!116!62!39!41!13!10!125!13!10!125!13!10!32!13!10!
82!40!41!59!13!10!32!13!10!60!47!83!99!114!105!112!116!62!

I’ve modified it slightly to mask its original destination. Any good programmer will instantly work out that this does but I’ll go through it for the rest of you. The big block of numbers in the middle are ASCII codepoints - numbers that represent characters in the ASCII character set. The script decodes the numbers into characters and writes them to the document. What that block represents is this:

[html]

[/html]

Seasoned spammers will recognise this bit as a simple old-fashioned redirect.

Another variation on the old eval(unescape(…)) trick but interesting nonetheless.

For interested parties, here are some quick Python functions I bashed out to help me write this post.

[python]
def encode(s):
o = “”
for c in s:
o += str(ord(c))+”!”
return o

def decode(s):
return “”.join(map(chr, map(int, s[:-1].split(”!”))))

assert decode(encode(”foo”)) == “foo”
[/python]

Enjoy

-Rob & Dave

Soliciting opinion: Google Website Optimizer

Hey, Rob here.

We’ve recently been toying with the idea of producing a multivariate testing tool in light of a few technical and otherwise problems with Google’s offering. In terms of conversions, it’s money for nothing.

So, does anyone have any comments, suggestions or grievances with the mutlivariate testing tools out there on the market today? If we can come up with enough support and ideas then it might be a project we’re interested in.

On that note, if we were to do anything like this and if you would like to help beta-test the new service, send me a personal E-mail and I’ll get back to you. We’ve got a window coming up in the next few days so we’ll be starting sometime after then. We’re after people with reasonably-trafficked sites that can track conversions based on visiting a specific page, such as an enquiry form or a checkout.

Thanks!

-Rob

Relevancy: Google vs Yahoo

Look at this:

http://www.google.com/search?&q=wordpress%20theme%20writing&sourceid=firefox
http://search.yahoo.com/search?p=wordpress+theme+writing&ei=UTF-8&fr=

I mean what the fuck is going on there? Clearly Yahoo’s top few results are the ones I’m after. Sort it out Google.

Remind me again why we’re panding to these people? This wasn’t the first search I ran by the way. After a few different keyword searches in Google (to no avail) I plugged the last one I used into Yahoo and got there straight away.

And you know what? It happens time, and time, and time, and time, and time again. I swear G’s still my default search because they waste so much of my time I don’t have any left to fix my bookmarks.

Yours,

-Pissed Off Rob

PHP closures and a quick Debian tip

Dave’s away and I get to indiscriminately litter his blog with posts, so I just wanted to mention something that got me a bit excited a few days ago.

I subscribe to the DevZone RSS feed so I get the (daily) Zend Weekly Summaries. A few days ago they reported on a conversation about anonymous functions. Now, you should really read TFA to get the full picture, but basically this is how anonymous functions (don’t) work in PHP at the moment:

[php]
< ?php
$arr_plus_one = array_map(create_function("int", "return ++$int;"), $arr);
?>
[/php]

Bollocks, right? Well the proposal is for a new syntax for this which would bring PHP much more in line with modern languages like JavaScript:

[php]
< ?php
$arr_plus_one = array_map(function($int) { return ++$int; }, $arr);
?>
[/php]

Much better, yeah? Well the discussion is really more about whether this function should become full-blown closure support in PHP, rather than just a new anonymous function syntax. I just wanted to put it out there and say that I strongly believe Zend should implement full closure support for PHP 6, even if the scoping rules are dodgy. I know newbies are going to be confused by the scoping rules at first but you don’t need to use closures if you don’t understand them. It will also bring PHP a lot closer to being a modern programming language. As it stands PHP is just a fluffy C with just as dodgy OOP. It’s great and I love it and don’t get me wrong I won’t write a web app in anything else, it’s just that it’s a bit frustrating writing beautiful JS and Python then having to go back to PHP :-)

I mentioned Debian didn’t I? Here’s a quick tip that doesn’t warrant another post: If you administrate a bunch of Debian servers, look at the apticron package:

apticron report [Fri, 18 May 2007 06:25:09 +0100]
========================================================================

apticron has detected that some packages need upgrading on:

        ganesh.bronco.co.uk
        [ 192.168.0.18 ]

The following packages are currently pending an upgrade:

        xfree86-common 4.3.0.dfsg.1-14sarge4
        libice6 4.3.0.dfsg.1-14sarge4
        libsm6 4.3.0.dfsg.1-14sarge4
        libxext6 4.3.0.dfsg.1-14sarge4
        libxt6 4.3.0.dfsg.1-14sarge4
        libdps1 4.3.0.dfsg.1-14sarge4
        xlibs-data 4.3.0.dfsg.1-14sarge4
        libx11-6 4.3.0.dfsg.1-14sarge4
        libxmu6 4.3.0.dfsg.1-14sarge4
        libxpm4 4.3.0.dfsg.1-14sarge4
        libxaw7 4.3.0.dfsg.1-14sarge4
        smbfs 3.0.14a-3sarge6
        samba 3.0.14a-3sarge6
        samba-common 3.0.14a-3sarge6
        samba-doc 3.0.14a-3sarge6
        xterm 4.3.0.dfsg.1-14sarge4
        xutils 4.3.0.dfsg.1-14sarge4

Sweet yeah? It also goes on to say exactly what the updates contain, which is great. Get it!

Would anyone like some free backlinks?

Steady, Matt. We’re not selling them so it’s okay, right? Actually I won’t even be providing them. It’s all down to the good folks at PHP.

Some of us might remember the Month of PHP Bugs in March, which I have to say passed without great fanfare. I think it’s probably because it made us all look bad so less said about that the better. Anyway I was reviewing today’s server patches (via the magical apticron utility) which reminded me that I should probably review the results of the MOPB. Boy am I glad I did!

Take a look at this little doozy

Basically, it’s an XSS vulnerability in the phpinfo() function which gives unescaped output for all user-submitted arrays in GET, POST and Cookies.

Translation?

Well if anyone has a spare phpinfo() for PHP versions 4.4.3 -> 4.4.6 hanging about, try appending this to its URL:

?f[]=%3Ca%20href%3Dhttp%3A//www.davidnaylor.co.uk/%3EDaveN%20Ownz%20j00%3C/a%3E

Then scroll down to “PHP Variables”. If you have an exploitable version, you should get one, clean, un-condomned backlink. Ain’t that precious? So all you would need to do is to get a bunch of them indexed and you’re happy as Larry. However happy he is.

Now would anyone like 60,600 free backlinks?

PS. For those that don’t get it yet, this post was written by Rob, one of Dave’s programmers. In Vim. Proudly.

Stopping bad robots with honeytraps

Following up on our recent Robots.txt Builder Tool announcement, I want to talk a bit about how to deal with robots that do not follow the Robots Exclusion standard. I’m sure at least some of us are familliar with the tale of Brett Tabke and his open warfare on robots hammering Webmaster World. I’m not going to go in to it, but he largely solved his problem with rutheless use of Honeypots/Spider Traps.

The basic premise is this:

  1. Robots follow links
  2. Good robots obey the robots.txt file. We can control these.
  3. Bad robots do not. We want to ban these.
  4. Thus: A bad robot will follow a link to somewhere denied by robots.txt.

Our attack has two distinct sections:

  1. Catch the robots, and…
  2. Kill the robots.

Catching a bad robot

To do this, we’ll be creating hidden links around our site and deny access to their destination with a /robots.txt directive. We will then be storing IPs of the bad robots for later use.

As usual for my posts on David Naylor we’ll be assuming a Linux, Apache, MySQL and PHP (LAMP) setup. However, the technique is really quite simple and is easily adaptable to your stack of choice.

The Link

Okay, so we need a link on your site which is visible to spiders and not search engines. Matt gives a great tutorial on how to do it on his blog. This is technically cloaking but Google says it’s okay so we’re going to plough right ahead.

What we’re going to do is create a link that isn’t visible to humans, but one that a robot would pick up easily. The anchor text should be invisible, but should someone read the source of use a weird browser it should warn the visitor not to click it. After all, if they do they’ll get banned.

I’m not going to give you precise instructions on this because we want to avoid botwriters using heuristics to avoid honey traps. However, here’s some tips:

  • Link to a page that indicates it’s a trap without being obvious.
    • Bad: /honeytrap.html, /trap.html, /badrobots.html
    • Good: /avoid.html, /dontclick.html, /bad.html
  • Use anchor text which gives you a fair warning, eg “Clicking on this link will get you banned”, “This link is to trap b@d sp1d3rs and r0b0tz”.
  • Hide your link creatively
    • Remember: It must appear in the source as a regular link. The trick is to hide it afterwards
    • You can do this with:
      • Styles - display: none; perhaps position off the page or underneath something (with z-index)
      • Obfuscation - white-on-white text, 1px shim image
      • JavaScript: I don’t recommend this.
      • Don’t display the link and tell users not to click it. Have button, will push. Remember, we told Bush not to bomb Iraq.

Rememeber, your link needs some content inside it otherwise most HTML parsers will skip over it.

The Robots.txt File

This bit is really easy. You need to create a robots.txt file inside the root of your website (that is, the top-level directory) which disallows access to the URL you chose. For example, if I decided my link should point to /badboy.php, my robots.txt file would look like:

[code]
User-agent: *
Disallow: /badboy.php
[/code]

You can even use our Robots.txt Builder Tool to help you with this.

Any well-behaved bots should never access /badboy.php from now on. Make sure you upload your robots.txt file before you implement the next section.

I’m going to refer to our link (eg. /badboy.php) as the spider trap. The rest of this tutorial will refer to /badboy.php but please do not use this yourself.

Storing the IPs in the Spider Trap

Okay so now you want to make your spider trap. Create the page /badboy.php and open it up in your favourite code editor.

Our PHP for this is really simple, we’re just storing some environment variables in a database. I’m going to assume you can go through the rigmarole of connecting to a database and managing XSS attacks properly yourself. We should probably log a bit more than just the IPs of the bots. I also want to store their User-agent and the datetime that they visited:

[php]
< ?php
require_once(”DB.php”);
$db = DB::connect(”mysql://user:pass@localhost/database”);
if (PEAR::isError($db)) die(”Could not connect to database”);

// if you don’t know what PEAR::DB is I suggest you find out!
$db->query(”insert into badrobots set ip=?, useragent=?, datetime=!”,
Array($_SERVER[’REMOTE_ADDR’], $_SERVER[’HTTP_USER_AGENT’], “now()”));

echo “You’re nicked, son.”;
?>
[/php]

Don’t forget to add an index on that ip column in your table.

Now the bad bots will visit this page and get their IP logged. Hurrah!

Banning the bots

So now we want to actually ban our bad bots. This isn’t actually as simple as it sounds. Basically, we have three options:

  1. Ban the bots with PHP
  2. Ban the bots with mod_access (Allow from.. Deny from..)
  3. Completely ban the bots with firewall rules.

I’m going to discuss option #1 in this tutorial. It’s not the best option but it is easily the simplest. You see, with option #1, our server is still accepting the request and firing up a PHP interpreter before the connection is rejected. We’ve also had to connect to a DB and do a read on it. However, both the other options won’t interface with a DB so require manually adding the rules or compiling them periodically. Worse, option #3 could end up with you completely unable to access your own server if it goes tits up. However, it is the only option that will protect your server from a monumental hammering.

Anyway, banning the bots with #1 is dead easy. All you need to do is make sure this following bit of PHP code is execute at the start of every page on your site, as soon after you connect to your database as possible. My DB syntax might be different to yours, but as an experienced website operator I’m sure you can translate, right?

[php]
< ?php
// connect to DB, etc
if ($db->getOne(”select count(1) from badrobots where ip=?”, Array($_SERVER[’REMOTE_ADDR’])))
die(’

You have been banned from this site for poor robot behaviour. If you think this is in error please contact the server administrator.

‘);
?>
[/php]

And there you have it! You might also want to log bad robot accesses but.. I dunno, up to you.

And that brings us to the end of our tutorial. I hope you enjoyed it! All comments, suggestions and errata to the usual place.

Congratulations to Richard Hearne for being the first to suggest how I would better store an IP in a MySQL database. However, he neglected to mention that ip2long returns the IP as a signed int and needs to be converted with sprintf. Johannes suggested my favourite method of using MySQL’s built-in INET_ATON and INET_NTOA functions.

one.com
smx

Start with £50 credit in your new Yahoo! Search Marketing account for a limited period only.

+ Advertise Here