Google Analytics Short Tail/Long Tail Segmentation
I was just pulled into a conversation with Paul and James about developing a report of some kind in Google Analytics which segments short tail (1-2 words long) and long tail traffic (3 words or more). The advantage being you can then compare them and use it to estimate the amount of long tail gains you would get from going after a short tail keyword.
For example, say you are currently getting 1,000 visits from short tail keywords and 5,000 visits from long tail keywords, this means you are getting 500% more long tail traffic. Now using this ratio you can estimate how much traffic you are likely to receive by going after a keyword, for instance if you target “wooden chairs” you would notice it has 5,400 searches in the UK (according to the Google AdWords Keyword Tool) and if you were to get a number 1 position you might get say 10% of this. So you are likely to get 540 visits from that one keyword, but the long tail traffic would be five times greater than this, 2700 visits (based on the ratio above). So a total potential 3,240 visits.
This may not seem that great, but it makes all the difference when you are considering the ROI of going after a particular keyword in an SEO Campaign. We have noticed that the long tail / short tail ratios differ dramatically between clients – so its a bad idea to use one across industries/clients – you may also want to exclude brand searches, as if it has a strong brand this will skew the data a lot.
The Google Analytics Segments – Shared
Here is the short tail segment
Here is the long tail segment (the inverse of the short tail segment):
Update: These have been updated, based on a regular expression made by Ben Gott.
If you would prefer the regular expression, here it is (this is the short tail reg ex, exclude this to get the long tail) – it shows all one or two keyword phrases it shows all the keywords which contain 2 or more spaces:
(^[a-z0-9]+ [a-z0-9]+$|^[a-z0-9]+$)
(\s|\+).*(\s|\+)
[^\s\+]+(\s|\+)+[^\s\+]+(\s|\+)+[^\s\+]+
Update: Please ensure you set it to medium = organic, otherwise the long tail segment will include all other traffic unless you are in the search engine traffic report.
Enjoy! Let me know if anyone has any other ideas about this.






Dave 1113 days ago
Another thing I like to do is create a ‘brand’ segment to go along with these and then remove them from the other segments. This gives a deeper picture by not having brand related terms in the head and tail segments. This often can mess up the head term data (for 1-2 word brands). So it’s worth doing
David Whitehouse 1113 days ago
Yeah, obviously we couldn’t share a ‘brand’ segment, because that would be unique to each client. But yeah, it definitely has a big impact (as I mentioned).
Vipul 1113 days ago
http://twitter.com/vipulgThis is great stuff. Segmentation is of great help as we can analyse data from different angles.
Dave 1113 days ago
http://www.djb31st.co.ukVery useful!
Its easy to forget just how much power you get with google analytics.
Great example of a useful segment.
Is there a database of common useful segments? Appears there would be a niche need for this, but google returns nothing
Alex 1112 days ago
http://www.analyticsseo.com/I am always prefer to use seo tools that handle that part too, like http://www.analyticsseo.com/ i have recently tried out this website tool. This tool allows me to monitor my website activities without adding any JavaScript code into our website
Andrew@BloggingGuide 1112 days ago
http://webuildyourblog.comGreat analyzation….very enlightening!
Clement Mazen 1112 days ago
http://www.autoquake.comThanks for this – I had been using a similar syntax but found a way, thanks to an article by Ben Gott on SEL, to apply a more economical and robust one and get a better result, using “\s” to denote a space instead of using “(a-z,0-9)” to denote all other characters.
At least 3 words simply becomes:
include: \s.*\s
At most 2 words being:
Exclude: \s.*\s
Modifying the number of” \s” allows to change the settings regarding the number of words within the search term. Beyond the simplicity of this syntax, I quite like the fact that it “takes care” of words formed with any non-standard alphanumerical characters, in addition to a to z and 0 to 9.
On that note, if you are interested in GA filters and would like to know more, I recommend the O Reilly Regular expressions Pocket reference, as there are many other “shortcuts”. Has anyone got a link to a useful online ressource for RegEx?
DangerMouse 1112 days ago
Out of interest why did you select 10% as your figure for a number 1 ranking for the Exact match on that term?
David Whitehouse 1112 days ago
http://www.david-whtehouse.org/@DangerMouse – it was just an example, a rough estimate of what some rankings may get at position 1, it also made the maths much more simple. I didn’t base it on an average of our data from Webmasters Tools or anything like that!
We have done comparisions with the number of visits and the exact match search volumes and often it was in the range of 5-12% (from what I can remember) at position 1.
David Whitehouse 1112 days ago
http://www.david-whtehouse.org/@Clement Mazen that is a much better way of doing it, I’m going to change the filters to that and link him. Certainly an improvement.
David Whitehouse 1112 days ago
I’ve updated the filter – there were a few minor issues that needed ironing out, so anyone who has commented – you may wish to update.
Clement Mazen 1112 days ago
http://www.autoquake.comThanks David for the update. Two more things about the (\s|+) version suggested by Ben, though:
1. if you want to use the filters for paid search data, and use the new Broad Match Modifiers, the use of “+” signs as modifiers will completely throw the filters off:
“+seo”, for example, would then be considered a two-word search term.
2. Even though some searchers occasionally separate words in their query with a “+” rather than a space, it is rare enough to be ignored, and when searchers do use a “+”, they tend to use it as in [search + engine], not [search+engine], in which case the (\s|\+).*(\s|\+) filter does not work as expected, capturing this as a long tail term anyway.
Maybe a better way to tackle the “+” signs would be to set up a profile with a custom filter replacing ” (+|\s\+\s) ” by ” \s) “, then ” (^\s|\s$) ” by nothing and ” \s\s ” by ” \s ” to remove extra spaces (~equivalent to TRIM() in Excel).
It is very exciting to see how much GA functionality is “below the surface”, especially with those RegEx and all the segments/custom filters you can use them on. David, a proper tutorial on GA RegEx would be very useful
Websites out there tend to target programmers rather than web analysts and can are not always very helpful when tackling GA.
David Whitehouse 1112 days ago
http://www.david-whtehouse.org/@Clement – In reply to your comment.
1. I see what you are saying, the filter I have created is purely for organic traffic – I did originally intend it for both PPC and SEO – but under Avinash’s advice I limited it to just organic.
2. I see what you mean, this is certainly a problem, I think we coud fix this by doing some of your suggestions.
Making (\s|\+).*(\s|\+) into (\s|\+)+.*(\s|\+)+ would allow multiple spaces and +’s in a row.
I’d like to stay away from custom filters and just improve or use multiple regular expressions. But I like your suggestions of using ^ and $ – I’ll have a think and see if I can improve it a bit further, see if I can solve the trailing and leading slashes.
David Whitehouse 1112 days ago
@Clement, again I’ve updated it:
[^\s\+]+(\s|\+)+[^\s\+]+(\s|\+)+[^\s\+]+
Basically it is anything except a space or plus sign, followed by one or more space or plus signs, followed by one or more characters (no spaces or plus signs) followed by… etc.
My head hurts now.
Clement Mazen 1112 days ago
http://www.autoquake.comGreat stuff – that should really nail it!
vdouda 1107 days ago
Cool tip. Works great but could you explain a bite more this regex.
What does the ‘s’ stand for?
In fact I’d like to build segments like:
- Only 1 keywords
- 2 keywords
- 3 keywords
- 4 keywords
- >5 keywords
Could you help?
Many thanks
vdouda 1107 days ago
Found this on another blog:
* 1 keyword : ^\b\w+\b$
* 2 keyword : ^\b\w+\b \b\w+\b$
* 3 keyword : ^\b\w+\b \b\w+\b \b\w+\b$
* 4 keyword : ^\b\w+\b \b\w+\b \b\w+\b \b\w+\b$
* 5 keyword : ^\b\w+\b \b\w+\b \b\w+\b \b\w+\b \b\w+\b$
Do you agree ?
Clement 1107 days ago
100 ways to skin a cat! I found another one on http://www.webrankinfo.com/dossiers/techniques/algo-mayday#comment-36923, more compact and more flexible, and which apparently works even with accentuated characters (useful for non-english languages).
1 word : ^(\W*\w+\b\W*){1}$
2 words : ^(\W*\w+\b\W*){2}$
Between 2 and 5 words : ^(\W*\w+\b\W*){2,5}$
Over 7 words : ^(\W*\w+\b\W*){7,}$
David Whitehouse 1106 days ago
http://www.david-whtehouse.org/The \s denotes a space, if you don’t escape it with a “\” it will just be included as a space.
David Whitehouse 1106 days ago
http://www.david-whtehouse.org/Sorry it will just be counted as an s, if you don’t escape it (It is only counted as a space if you escape it).
Daniel Sim 1105 days ago
http://www.pluginseo.comInteresting way to predict visitors from long tail terms.
I do find juggling lots and segments still a bit unweildy in GA. It’d be great if GA grouped keywords automatically to give as good a picture as we get from adwords.
Search Down Under – SEO Cafe Learnings | 1105 days ago
[...] Has your site been affected? Through Google analytics advanced segmentation marketers / search pro’s can segment short tail and long tail terms into 2 categories and analyse the behaviour of each. For those wishing to utilise this segmentation technique refer to the following article which provides a quick link for the segment to be set up within your Google analytics profile – http://www.davidnaylor.co.uk/google-analytics-short-taillong-tail-segmentation.html [...]
Andy Beard 1103 days ago
http://andybeard.euIt is all great in theory, but out in the wild you can’t determine longtail in that way.
As an example recently my blog was on the first page of results for “gmail” here in Poland, as high as #4
That is a long-tail term (for me) of no value (to me) with a CTR something like 0.03%
http://andybeard.eu/gmail-ranking.png
I have talked to Dave about my “wacky stuff” about a year ago – I didn’t see a gain in search traffic, and this just meant I lost more relevant “long tail” traffic. to a blog post that was relevant 3 years ago.
However interestingly Google didn’t win (so far) the gmail.pl domain in Poland, and that domain has really messed up SEO.
Roy Olders 659 days ago
http://www.seo-sharkx.com/I thought that there is a rule that a 3 word keyword phrase indicates people that are more willing to buy (and are more specific) than 1 or 2 word keyword phrases.