Blue Tomatoes

A thread a SEW started by Dave Hawley asked the question…

Do forum signatures really help with Google ranking?

Well my spin on this is it does, same as Blogs, Guestbooks etc etc… But not very much, the more I look at Google the more I see Themed links working…
If site A has 100 backlinks about PHP and is a site about PHP a link from that site will weigh heavily from PHP related terms like “programming”, if you run your search terms via a ontology script you get other terms like:
Search Term: Blue Tomatoes tomatoes
blue
blue tomatoes
blues
potatoes
tomatoes
peppers
tomato’s
tomato
carrots
vegetables
So a good “Blues Tomatoes” link should be coming from a themed site which has one of the keywords from the above list, so by default if you linked to a Viagra site the link should weigh a lot less… if that makes any sense… of course if 110,000 non themed links say “Blue Tomatoes” but that’s a different story all together…
DaveN

Jason Duke

In the Industry there are a few people I listen too Jason is one of them …

he are his views on hilltop and pagerank

New Google Ranking Formula = {(1-d)+a (RS)} * {(1-e)+b (PR * fb)} * {(1-f)+c (LS)}

Like the old algo there are damping factors in place and again what we don’t know what they are so once again I haven’t included them in my plain English example, just leaving factors we can work with and can deliver and/or register.

The Hilltop algo adds to the old Algo by giving a further multiplier, the LocalScore Rank (LS)

LocalScore builds upon PageRank by building a score for a page based upon the inward links to a page that come from “on topic” “authority sites”

I won’t go into details about how LS and Hilltop works (as there are far better resources out there at explaining it than I can muster) but in essence it means that LS has a massive effect on the previously SEO’d pages for Searched phrases that have marked as needing to be more relevant and therefore having the LS score applied.

I say the “searched phrases that have been marked as needing to be more relevant” as it has been put forward that LocalScore only comes into play for a subset of searched terms. This is because of the massive computational overheads of working out a LS for a page and the impossibility (with G’s current architecture) to compute the LS on the fly for a search phrase.

Jason Site : strange logic

Google Suggest scraper

I’ve just finished porting over some of our in-house software to a section of this website. One of them, is a Google Suggest scraper. It takes a keyword, then grabs google’s suggestions, and then… does it again. Apparently, that can be useful.

The rest of the tools can be found here. Comments about presentation will be billed for wasted bandwidth :-)

Google Suggest slightly broken?

While optimising some GooSug code earlier today (as you do), I inevitably typed in “foo” (as you do), and came across an interesting little bug.

Example:

Go to Google Suggest and type in “foo”.

The bottom result is “foodnetwork.com”, and claims to return 1 result. Which it does. Go down to it, and press right.

This is where it gets interesting. First, Google seems to be randomly concatenating keywords in the results, and claims they return 1 result. However, most of them don’t.

The Big Question: Why?

The most boring theory comes from the “foodnetwork.cominthekitchen” suggestion. This does, in fact, return results. More accurately, one result, which is a scrape site. That’s clearly how Google has picked up on it. Perhaps the other results are similar scrape sites that have produced these concatenated phrases.

This leads us to the conclusion that the results data for GooSug is old. Hmm…

Anyway that’s as far as this post goes, as I’m just a programmer. That, and I have scrapers to optimise :-)

Google.com 302 to Google.co.uk

This weekend i notice that Google have started looking at UK IP addresses and then redirecting you to www.google.co.uk, well thats not very nice i wanted to go to .com…. anyway it’s cookie based so we can still use the “no country redirect” http://www.google.com/ncr that should reset the cookie and leave you on the .com ..

Googles server header information :

www.google.com (66.102.11.104): HTTP/1.1 302 Found Location: domain=.google.com Content-Type: text/html Server: GWS/2.1 Content-Length: 217

www.google.co.uk (66.102.11.104): HTTP/1.1 200 OK Cache-Control: private Content-Type: text/html Server: GWS/2.1 Content-Length: 2349

DaveN

Hijack Quick Test

Do a search in Google for your domain www.yourdomain.com then mouse over it, if the url is not yours then bingo it’s hijack or an accidental 302.

If you are really worried about people hi-jacking your website then you could write a small script that captures the referrer and does a header check … if the header check returns a 302 .. panic .. run around the room for awhile then sit down… you where clever enough to find the 302, you should be clever enough to combat it..

1) Find whois info … email them tell them that they are cheap robbing gits and remove the 302..
2) False Whois .. report them to registrar and there host and to make sure you tell all your friends just in case it’s one of your friends ( best do friends first)
3) All else fails 302 them back, now it’s the national lottery ;)

but you only have a small window of opportunity, so acted quickly most ISP will have the information on the domain get to them, they hate been woke up and tend to remove after asking few questions.

here are a couple of threads about hi-jacking

SearchEngine Watch
Threadwatch
DaveN

Mini Fetch Code

Mini fetch Code

< ?
$theLocation=”http://www.wired.com/”;

preg_match(”/^(https?://)?([^/]*)(.*)/i”, “$theLocation”, $matches);

$theDomain = “http://” . $matches[2];

$page = $matches[3];

$fd = fopen($theDomain.$page, “rb”);

$value = “”;

while(!feof($fd)){
$value .= fread($fd, 4096);
}

fclose($fd);

$start = strpos($value, ““);

$finish = strpos($value, ““);

$length = $finish - $start;

$value = substr($value, $start, $length);

$FinalOutput = preg_replace(”/(href=”?)(/[^”/]+)/”, “1″ . $theDomain . “2″, $value);

echo $FinalOutput;

flush ();

When Blog Spamming..

I have heard stories ;) …

That some blogspammers can cause DDoS attacks, so I looked into this

Most spamming scrips will crawl Yahoo and Google of things like MT_comments.cgi
once the database is large enough, you can start automating the signings,..

well what if you had spidered the Blogs and collect ALL their url’s if you started signing in that order you could hit some poor bugger a 10,000 times ..

Top Tip use Rand() ;)

DaveN

and to the DDos Guy i will never let you forget it ;)

Now then

Hm, I’ve got my own section. Pff…

Well I’m the PHP programmer at DaveN Industries. I don’t think this section will last long, I’m sure the novelty will wear off soon.

What am I up to at the moment? Well I’m writing a new object-orientated shopping cart blah for use in house. This will let us create e-commerce sites much faster, although it’s taking bloody ages to do. I swear, you can write a quick ecom site in half an hour, but as soon as anyone mentions scalability, your development time explodes like a gerbil in a microwave.

Also, I’m responsible for a few of the crawlers contributing to the daily Net traffic. Had a few complaints about eating up bandwidth already. Multithreading is easily the best thing I learnt how to do this week.

DaveN

Ok, i got asked by a few people to start a Blog… god knows why.. hehehe anyway you will find me posting at webmasterworld, searchenginewatch forums and threadwatch, I will try and post some stuff that should be interesting like my views on Blog Spam , 302 hijacks , Sandbox.. etc

I have also given access to the programmers and designer to add a bit more light on the DaveN empire ;)

anyway this is a test post so I’m off to play with new Google suggest tool which was posted at threadwatch

DaveN

one.com
smx

Start with £50 credit in your new Yahoo! Search Marketing account for a limited period only.

+ Advertise Here