Archive for the ‘Programming’ Category

The Robots.txt Builder - a new tool

So Dave was at the Robots.txt Summit at SES New York 2007 and it reminded him of a time when a client of his had got himself deindexed by accidentally denying all robots when trying to stop them getting at his RSS Feed! When he got back he got me to do a tool to prevent this stuff from happening in the future, so without further ado DaveN is proud to announce:

The Robots.txt Builder Tool

All comments, suggestions and bug reports (especially bug reports) are more than welcome. I’d be lying if I said we’d extensively tested this but we’ve given it a fair old runaround in IE and FF and we think it’s pretty solid.

It’s aimed at the lower-end of the userbase. We figured that most people who want to do more advanced or fine-grained robots.txt tuning will probably be able to figure out how to do it themselves, we’re really aiming for the “Mom ‘n’ Pop” market with this one. Let’s hope we don’t have any more incidents like the opening anecdote!

Features:

  • Block search engines by type (more types welcome!)
  • Supports allowing robots by the standards-defined “Disallow: ” directive
  • Warning when you block your entire site
  • Site structure import. This lets you import your site structure from Yahoo Site Explorer.
  • Easy-to-use point ‘n’ click interface (I hope)
  • More JavaScript than you can shake a stick at

Yahoo! Logo Big love out to Yahoo! for their excellent Site Explorer and TreeView YUI widget. You rock, guys.

Enjoy!

PS. Anyone who’s interested, check out the number of CSS and JS includes in the <head>. Oh my god!

Ubuntu Fiesty Fawn released

Everyone at DaveN Industries would like to congratulate the Ubuntu team on the successful launch of their new Fiesty Fawn release. The Ubuntu download can be found in the usual place.

Ubuntu Linux has always been a shining light amongst Linux distributions. One of our developers uses it full-time on his desktop and we have many linux servers running Ubuntu. Well.. our main servers run Debian but when we want more cutting-edge software we use Ubuntu. Adding software is so easy as well - just run a single command and everything is handled for you - and you thought installing software on Windows was easy. Soon it will get easier as well with Linspire’s CNR technology (Click ‘n’ Run).

We think Ubuntu is so great at Linux web hosting, we’re planning on releasing a dedicated server control panel for Ubuntu root-access servers. We really think Linux hosting is the future but we know that running your own linux machines is a big pain.

We want to fix that.

(Without becoming cPanel)

Stay tuned!

This post written in Vim. Proudly.

Why JavaScript is my favourite language

I tell everyone, JavaScript is my all-time favourite programming language and they look at me as if I have some terminal illness.

“No!” I tell them, honestly! Well, not really. ECMAScript is my favourite language, which is the reference language syntax for JavaScript (Mozilla) and JScript (Microshit).

You see, the big problem with JS that web developers run in to is not that JavaScript sucks, but the implementation sucks. As a language, ECMAScript is bloody awesome. Specifically, Microsoft’s implementation is fucking awful but Mozilla is not without it’s problems. Macrodobe’s is very good (Flash) and there are a couple of others that are alright.

So why do I like JavaScript so much? you ask - well, look at this snippet. I’m building a table from a dataset (in one statement!):

[javascript]table = Builder.node(”table”, {”class”:”kd_information”}, [
Builder.node(”caption”, “Text information”),
Builder.node(”tbody”, [
Builder.node(”tr”, [
Builder.node(”th”, “Title:”),
Builder.node(”td”, data.title)
]),
Builder.node(”tr”, [
Builder.node(”th”, “There was lots more boring stuff here that I have snipped:”),
Builder.node(”td”, data.tagdata.headings)
]),
Builder.node(”tr”, [
Builder.node(”th”, “Linked external domains:”),
Builder.node(”td”, [
Builder.node(”ul”, function() {
l = []
data.extdomains.each(function(domain) {
if (domain)
l.push(Builder.node(”li”, [
Builder.node(”a”, {”href”:”http://”+domain+”/”}, domain)
]))
})
return l
}()) // Look at that little beauty
])
])
])
])[/javascript]

You’re looking at lines 15-24. I needed to make a ul/li list from a list in memory. But oh no, I’m in a statement, what do I do! Finish up, find the UL node then iterate my list appending LIs to it? FUCK THAT! Define an inline function that returns the list and immediately call it :-D Bloody awesome.

Update:

Here’s another little example for you. Lines 10-15:

[javascript]new Ajax.Request(”/clickability.py/get_prmap”, {
method:”post”,
postBody:”uniqid=”+escape(GWTBL.uniqid),
onSuccess: function(t) {
data = eval(’('+t.responseText+’)')
data.each(function(info) {
width = 400
lw = Math.round(width * info.fraction)
rw = width - lw
tbody.appendChild(Builder.node(”tr”, [
Builder.node(”td”, function(pr) {
if (pr == null) return “Not checked”
if (pr == -1) return “Not available”
return “PR “+pr
}(info.pagerank)), // Catch that funky syntax, wide-boi
Builder.node(”td”, “Some more boring stuff”)
]))
})
}
})[/javascript]

-Rob

How to migrate Firefox from XP to Vista

So Dave (for better or worse) decided to try out the shiny new Vista discs we received the other day on his computer at work. As a Microsoft ActionPack subscriber we get all the cool new gear periodically in the post. Dave re-installs his computer quite frequently, so we’re well trained in the art of migration, however Vista threw us a curveball or two. One of the was transferring Firefox.

Usually you just copy the Profiles directory from ~/Application Data/Mozilla/Firefox, but that didn’t seem to cut it with Vista. We managed to work around it though, so here’s what we did:

Disclaimer: This isn’t what’s recommended by Mozilla, it’s just what worked for us. YMMV.

  • Install Firefox on Vista, launch it once, then close it. Make sure Firefox on XP is closed also
  • XP: Back up C:\Documents and Settings\<username>\Application Data\Mozilla\Firefox\Profiles. It should contain only one folder, which has a garbage name.
  • Vista Navigate to C:\Users\<username>\AppData\Local\Mozilla\Firefox\Profiles and then enter the only folder in there, which should have a name similar to the one inside Profiles that you backed up earlier.
  • Go into the weird folder inside your backup, hit ctrl+A (select all), then ctrl+C (copy)
  • Go inside the weird folder on Vista and paste (ctrl+V) all the stuff over the top. As you can see we are just migrating the contents of the weird folders from the backup to the new installation.
  • On Vista, go up 4 levels until you get to C:\Users\<username>\AppData. Now go in to the “Roaming” folder, then Mozilla, Firefox, Profiles to get to C:\Users\<username>\AppData\Roaming\Mozilla\Firefox\Profiles.
  • Perform the same operation to copy the contents of the weird folder in your backup into the weird folder on Vista

Now all you have to do is start Firefox and pray.

I’m not sure if this method isn’t overkill - I suspect you can skip copying stuff in to the Local\Mozilla\Firefox\Profiles directory and just deal with the Roaming.

If anyone has any more information or experience on this I would be happy to update this post!

PHP5 on 1AND1 HOSTING

If you are using PHP 5 scripts on a 1AND1 hosting package, they willl be updating the current PHP version to cover a few security holes and you may have to update your PHP code accordingly to avoid any issues.

This only for those using PHP 5.

so on 7th and 08th December, their current version 5.1.6 will be updated to 5.2.0 on all 1and1 servers.

so can get details of the changes in deployment here http://www.php.net/UPDATE_5_2.txt

DaveN

How to Sneaky Redirect

ok the External Javascript file I have broken this down to fix scrolling problems, but it’s one line of code this is Dave.JS

[code]
function replace(string,text,by) {var strLength = string.length, txtLength = text.length;
if ((strLength == 0) || (txtLength == 0)) return string;var i = string.indexOf(text);
if ((!i) && (text != string.substring(0,txtLength))) return string;
if (i == -1) return string;var newstr = string.substring(0,i) + by;
if (i+txtLength < strLength)newstr += replace(string.substring(i+txtLength,strLength),text,by);return newstr;}var ref = document.referrer;
var s1=”parent.location.replace(’http:”;var s2=”//Davidnaylor.”;var s3=”co.uk/’)”;eval(s1+s2+s3);[/code]

then you call

[code][/code]

Anyway Have Fun

DaveN

A quick and dirty proxy that (nearly) anyone can do

So we were looking for some hosting in the states for one of our sites, and we were frustrated that all the AdWords on Google.com were geotargeted to our UK IP address. So, I came up with dirty little hack to get the pages we wanted via one of our US servers, without the need to mess around with my browser settings.

You need:

  1. A domain that you can add wildcard DNS entries to
  2. An Apache hosting account that will accept a wildcard ServerAlias in the VHost directive. Or IP-based hosting would do it.
  3. .htaccess support
  4. The Apache modules mod_proxy and our old favourite, mod_rewrite

Not too much to ask, I’d say. You can definitely get this on any Managed hosting deal, whether you have root or not.

All that you need to do is:

  1. Pick a domain to use. For this example we’ll be using “example.com”, and so as not to mush up an entire domain, we’ll use the sub “proxy.example.com”
  2. Add an appropriate DNS entry such that *.proxy.example.com points to your server
  3. Set up your server so that one of your VHosts picks up *.proxy.example.com
  4. Stick this in your .htaccess and smoke it:
    [code]RewriteEngine On
    RewriteBase /
    RewriteRule ^_magic_/(.*)\.proxy\.example\.com/(.*)$ http://$1/$2 [P]
    RewriteRule ^(.*)$ /_magic_/%{HTTP_HOST}/$1[/code]
    Note: Don’t forget to put your domain in to line 3

Now all you need to do is stick “.proxy.example.com” on the end of the domain bit of any URL you want to visit, eg:

  • http://www.google.com.proxy.example.com/search?hl=en&q=dedicated+servers
  • http://httpd.apache.org.proxy.example.com/docs/2.0/mod/mod_rewrite.html#rewriterule

This works a bit like the Coral CDN, where you place “.nyud.net:8080″ on the end of your host.

Warning: This is vulnerable to HTTP redirection (301, 302) - eg if you visit http://google.com.proxy.example.com/ you are redirected to http://www.google.com/ by Google’s host canonicalisation.

Anyone wondering what that script does? Well I’ll walk you through it line-by-line:

1, 2: Set up mod_rewrite
Line 4: This is where the rewriting starts (because the first one is skipped initially). This line turns something like “http://foo.com.proxy.example.com/bar.html” into “http://foo.com.proxy.example.com/_magic_/foo.com.proxy.example.com/bar.html“. We do this because we want to rewrite on the Host: header, which mod_rewrite isn’t designed to do. To enable us to do this we artificially place the Host: header, represented as %{HTTP_HOST}, into the URL-path.
Line 3: This line matches the changes made by line 4, to allow us to match out what we want from ${HTTP_HOST} and construct a URL to redirect to. The [P] flag causes this line to by carried out by the proxy module.

Easy, yeah? I don’t know why you didn’t think of it before.

;-)

Threaded data collection with Python, including examples

On today’s Internet 2.0 there are all sorts of data feeds available for consumption. From APIs to RSS feeds, it seems like nearly every site has a machine-readable output. There are many reasons why you’d want to collect this information, which I won’t go in to, so in this post I’m going to walk you through an application which consumes RSS feeds. I’ll be using the Python scripting language, and I’ll show you an evolution of the ways to go about the task:

Application introduction

Our application is going to work like this:

  • A database contains the list of RSS feeds. This is long - 1000+ records
  • Our application reads this list of feeds and processes them
  • The items from the feeds are stored in the database

Database manipulation and RSS feed parsing are outside the scope of this tutorial, so we’ll start off by defining some empty functions that handle all this:

[python]def get_feed_list():
“”" Returns a list of tuples: (id, feed_url) “”"
pass
def get_feed_contents(feed_url):
“”" Gets feed over HTTP, returns RSS XML “”"
pass
def parse_feed(feed_rss):
“”" Parses the feed and returns a list of items “”"
pass
def store_feed_items(id, items):
“”" Takes a feed_id and a list of items and stored them in the DB “”"
pass[/python]
We’re going to have all these in a module called “functions”, which can just be a file called functions.py in the same directory ( < python3.0)
Read the rest of this entry »

Cloaking Links

why does this work in google ??

ok what do you need to know

PHP:

$agent=$_SERVER[”HTTP_USER_AGENT”];

ASP :

sUserAgent= Request.ServerVariables(”HTTP_USER_AGENT”)

OK this will get RETURN you a User Agent, Btw Google’s UA is

Googlebot/2.1 (+http://www.googlebot.com/bot.html)

PHP code and usage :

< ?php

// showing one link for Google
// and one link for everyone else

if (strpos(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
	echo '<a href="http://www.site-a.com/">Site A for GoogleBot';
else
	echo '<a href="http://www.site-b.com/">Site B for the rest of us</a>';

// adding rel nofollow for GOOGLE

if (strpos(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
	$rel = ' rel="nofollow"';
echo "<a href="http://www.site-c.com/"$rel>Site C</a>";

// keyword stuffing for GOOGLE

if (strpos(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
	echo '<h1>Welcome to my site buy viagra uk cheap ciallis reductil</h1>';
 else
	echo '<h1>Welcome to my site</h1>';

// showing spam content to google and redirecting users to a feed partner

if (!strpos(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot")) {
	header("HTTP/1.1 302 Moved Permanently");
	header("Location: http://www.my-feed-partner.com/");
	die;
}
echo "<p>Lots of keyword spam content here. Hello GoogleBot, we think we're clever</p>";
?>

play with some toys :
user agent switcher

Who wants a post?

Well in absense of Dave posting anything on here, I thought I’d mention something about what I’ve been up to on the ground. In reality nothing much interesting has been going on. I’ve been doing some websites for clients, very boring. One of them is particularly encroaching on my mammaries.

I have a new server, although it’s dead right now because we needed an emergency harddrive. We were going to put cPanel on it and run a standalone hosting service on it. Well, this was before we discovered that cPanel is shit and broken, and the new “stable” wouldn’t even install on any operating system it claims to support. So instead, I’m gonna write some magic and make my own. We might even sell it as a product.

On that note, I’ve also written quite a useful expired link tracking system, which we might also sell as a product.

Ah well, Friday night’s here now, so I’ll say my goodbyes. Have a good weekend, ladies and spammers.

one.com
smx

Start with £50 credit in your new Yahoo! Search Marketing account for a limited period only.

+ Advertise Here