We were recently contacted by Edy from DeepCrawl to trial their product, at first I was skeptical that the online crawler was likely to be able to justify its price, when compared to cheaper software such as ScreamingFrog. In the end though, it turns out that DeepCrawl pays dividends in the amount of insight it can find in one crawl. So without further a do, let me show you some of the results from the crawl we did on www.davidnaylor.co.uk.
This Site Has Issues
The first thing that you notice when looking at the report is the list of issues:
Some of these were a bit of a surprise, in the report you can click on each one and get a detailed breakdown of the problem and the list of URLs. This helped us identify a number of issues.
Issue 1: Duplicate Pages
Funnily enough, this was something we had addressed (or at least thought we had addressed) on an old design of davidnaylor.co.uk:
These are all essentially the same page, realistically only one of them should resolve, with a canonical tag to account for the query strings.
Issue 2: Max Description Length (551 errors)!
We’re not entirely sure when this got started, again it might have been in the new design, but it appears our meta descriptions were/are being populated by part of the blog post content:
Issue 3: Image attachment URLs
Interestingly, we thought all these had been sorted as well, obviously not, for example:
DeepCrawl found over 200 of these types of error.
Record, Prioritise & Assign Issues
After I’d identified all the problems, I used the built-in “issue” functionality. This enables you to add an issue and assign it to someone, it stores the report you were looking at and it allows you to prioritise by importance. Once you’ve done you can share any reports/issues with external users using an encrypted URL.
The other cool functionality is what happens when DeepCrawl re-crawls a website. The brilliant thing about this is that it shows you the change from the last time it ran. This time I got all the critical issues fixed and then re-crawled the website. It shows you a really useful summary of the changes:
As you go into each report where there was an issue it prompts you with a popup, just click “mark as fixed” and save and hey presto. As you can see the critical issues has now been marked as fixed (as you may also notice, I need to pull my finger out!):
There is quite a lot of functionality we haven’t talked about or used yet, one thing I think is a big positive is the ability to schedule recurring crawls so that you can have a historic record of your website from week to week or month to month. On top of that it has white label options for agencies and you can export a PDF report such as this one.
You can also customise what it classifies as:
- Max description tagtitle tag length
- Max HTML size
- Max number of links/external links
- Min/max content size
- Max load time
- Max URL length
- Minimum content to HTML ratio %
- Max number of redirections
- Default language
No doubt they will be adding more options as time goes on and the more people use it. One thing we were discussing with them is the ability to download and track your links from Google Webmaster Tools – which would be a very useful feature indeed. They have a lot already planned, so it’s definitely a product worth checking out.