The Importance of Log Analysis
Written by Dave Collins, SoftwarePromotions Ltd.
https://www.softwarepromotions.com
Anyone who knows me, anyone who’s been reading my articles for a few months, or anyone who regularly visits the DaveTalks website will already know that I’m a little on the obsessive side when it comes to web log analysis.
A company’s web server logs are one of the most important and often-neglected sources of information at their disposal.
Without a detailed and careful analysis of this data, a company will be more or less blind and oblivious both to threats and opportunities.
There are two common mistakes that I see, time and time again.
(1) “My web host already offer online statistics. This is more than enough information.”
Big mistake. Huge mistake.
Comparable to “well there’s usually enough money in the bank, so who needs to plan and budget?”
(2) “I simply don’t have the time”.
Even bigger mistake.
If a company isn’t setting aside time each week or month to look through their log data, they have no idea how well their website is performing as a marketing tool, no idea how well they’re doing on the search engines, no idea how well their AdWords are performing, and no idea of how easily they might be able to improve their sales.
The good news is that even if you’re tight for time and don’t want to spend money on expensive software, there’s a lot you can do to start seizing these opportunities.
And the great news is that you only need three things. And some of them, or perhaps even all of them, are free. More or less.
The three things are log files, analysis software and time.
Log files.
If you don’t know where your raw log files are, or how to download them, then contact your server.
If their answer is that your account doesn’t include access to the raw log files, then move server.
I say that knowing all too well that this is one hell of a headache, but I can’t stress just how critically important and useful this information is. You don’t want to be running your business without it.
And don’t be put off by having access to the logs for today only. It isn’t enough. You need your server to archive and compress these files, and you need to be able to download them at your leisure.
We pay $1 a month for hosting some of our smaller auxiliary web sites, and each of these include access to compressed log file archives. So don’t let your server tell you that this is a high-end spec. It isn’t.
Software.
Once you get your raw log files on your own system, you’ll soon realise that you’re never going to get anywhere by reading through thousands of lines of log entries manually.
The good news is that you don’t have to use Notepad or Excel. The really good news is that there are some excellent software solutions out there to do the job for you, some of which are free.
We’ll look at some of the better software solutions in just a moment.
Time.
This is the only requirement for server log analysis that you need to actually make some of commitment. Like any other worthwhile activity, it’s going to take up some of your working day. But it doesn’t have to be too long, and I guarantee that it’s time well spent.
For now, let’s go back to the software itself. A quick look through the Log Analyzer section at TuCows, at the time of writing this article, shows almost thirty different applications listed.
Over the years, I myself have worked with a good nine or ten log analysis applications, so have a fair understanding of what’s out there. I’ve therefore listed four of the applications that I myself work with, each of which have their own strengths and weaknesses.
(1) Analog – Stephen Turner – free.
Analog is one of the better free options that are out there. It offers very little by the way of fancy extras, but is extremely fast, reasonably configurable, and is 100% free.
Pros:
If you want to dip your toes into web server log analysis without having to spend any money, then this is a good way to get started.
And if you use it with Report Magic, also completely free, you’ll get nicer looking reports.
Cons:
It’s very basic.
(2) 123LogAnalyzer – ZY Computing – from $129.95.
Pros:
It’s incredibly fast at analysing even large log files.
The price is very reasonable.
The filters allow you to select or strip out specific data sets – for example you can look at a 30 day chart of visitors from Google, or monitor traffic generated by a specific ad campaign.
The reports are among the clearest and easiest to understand that I’ve ever seen.
Cons:
The main database needs updating with details of new operating systems and browsers.
(3) Web Log Storming – Dataland Software – from $99.
Pros:
The price is very reasonable.
On the fly “dynamic” analysis means that you can drill down through your data in real time. This can be extremely useful.
In-depth probing is quick and simple, without having to reanalyse data or regenerate reports.
Cons:
It can be slow with large log files.
(4) ClickTracks – ClickTracks – from $495, pro version from $2995.
Pros:
Nothing I’ve seen is as effective at helping you instantly understand how your visitors are behaving.
Visual representation of user behaviour. Click patterns are superimposed over your website in the integrated browser.
Handles Google AdWords and Overture campaigns.
An instant look at what’s changed over a defined period (pro version only)
Cons:
The price is very high.
Overall, the application you use will probably depend more on your budget than your needs, but ultimately, any of these tools are better than nothing.
My advice would be to consider 123LogAnalyzer or Web Log Storming, as their filters and dynamic capabilities offer an insight that goes beyond any of the free applications that are out there.
But if you’re truly serious about understanding your website visitor’s behaviour, then ClickTracks is well worth the cost. And as highly priced as the Pro version is, it still offers exceptional value for money.
Now that you’ve (hopefully) decided which tool you’re going to use, we’ll start looking at what to do with it. Starting with what to track.
Broadly speaking, there are four main areas to concentrate on.
(1) Quantity: the number of visitors.
This is a fairly obvious one, but it’s important to go beyond the basics.
“I have around 250 visitors per day” is only scratching the very skin of the surface. You need to dig deeper.
How does it change with time?
When does it change with time?
Trends and patterns (eg. most developers see a 7 day pattern, most see seasonal trends).
Identify patterns to identify opportunities.
All of this is useful data, and all may be put to practical use. Examples include when to release a new version or product, when to send out a mailing or announcement, when to take down your website to test your new shopping cart, when to offer discounts and much more.
(2) Referrals: who’s sending you the traffic.
Who is sending you visitors?
What do these visitors do when they arrive?
Who isn’t sending you visitors?
Trends – falling or rising?
Are different referrers showing you different behavioural patterns?
Again, all very useful.
Site 1 may, for example, be sending you 500 visitors per day. But when you drill through the data, you might find that 99% of them spend a few seconds at your site, then leave immediately.
Site 2, on the other hand, may only send 50 visitors per day. But if they’re all spending an average of around a minute on your first page and then exploring your site, you’ll want to know about it.
(3) Time: more time = more opportunity.
How long are visitors spending on each of your site pages?
Are they spending enough time on the important content?
Are they reading what you want them to read?
If you’re selling something through your website, there will be pages that you consider more important, and pages that are less so. You don’t want visitors poring through your “about us” page, but barely glancing at your main product pages, for example.
Identify the more important pages on your website, and find out if they’re going there and reading the content.
(4) Links.
Which links are actually being clicked on?
On which pages?
Where are they located?
Are they buttons, graphics or text?
Is wording, colouring or position a factor? (Hint: the answer is yes.)
Which links aren’t being clicked on.
This is something that is often overlooked, but of critical importance.
On the most basic of levels, when someone has finished looking at one of your pages, they’re either going to click on one of your links, or leave the site. What they do is actually a lot more in your control than you may realise.
All of the above factors are of extreme importance. But before you can do anything about improving them, you have to identify and understand what you already have.
Before we start looking at the more practical and actionable items, I first want to look at a few important facts to take into account.
Too much data?
Most log analysis applications will provide you with a massive amount of information.
It’s important to understand that not all of the information is useful, and sometimes the terminology used can be misleading.
The most common example is the number of hits.
The number of hits to a page does not mean the number of visitors.
If a simple HTML page consists of a few paragraphs of text and two graphic images, then one person viewing this page (one time only) will count as three hits. 1 page + 2 images = 3 hits.
So in a more realistic example, if an html page has 15 images, one person viewing the page one time only will count as 16 hits. You get the idea.
Page Views is a more accurate figure, as this counts the number of times that separate pages are viewed. So one person viewing three pages = three page views.
However, this too can be misleading.
15 page views could consist of one person looking at 15 pages. It could also be 15 different people only looking at one of your pages. It could also, depending on how their browser and your server is set up, consist of one person viewing a small number of pages several times, and going back and forth between them.
Another example is the number of visitors. These are usually defined by IP addresses, but different applications will have different definitions. Application one may recognise my IP address, and for every single time I come back to the website, will only count me as the one visitor. Application two may define a set period of time, after which I will count as a second visitor. Application three may differentiate between visitors and unique visitors and so on.
So to understand what you’re looking at, you need to understand the terminology, understand how your server works, and also understand how your log analysis software is set up.
Different analysis software, different results.
Here’s an interesting fact. If you take one month’s worth of log files, and run them through five different log analysis applications, you’ll get five different sets of figures. None will be a perfect match with each other. Some will be quite close to each other, but some of them will differ considerably.
The question is what to do about it, and how to know which ones to trust?
My own solution is a simple one. I know from experience which ones more or less agree with each other, and can safely assume that these are therefore more or less accurate.
I can also assume that the irregularities displayed by any of the applications will be more or less consistent. In other words the figures may not be 100% accurate, but they’ll be close enough. And if I keep using the same application with the same server’s log files, I can rely on what they’re telling me.
Now that we’ve looked at all the main issues involved, let’s start looking at what you can actually do with the information you gain, and how you can use it to save money and even generate new sales.
(1) Track advertising.
Without analysing your server logs, you are more or less blind as to how effective any advertising campaign may or may not be.
Example. You may be able to log into your Google AdWords account to see how many clicks each of the keywords and ads are generating, but that is only half the story.
Some of your keywords and ads may generate a large number of clicks, and you may be quite happy to spend a lot of money on them. But would you be quite so eager if you found out that all those visitors are spending around 3 seconds before leaving your site?
Measuring the clicks alone isn’t enough. You have to know what the visitors are doing once they arrive. In order to do so, you need tracking and log analysis.
Without it, you’re probably throwing money away.
(2) Monitor search engine traffic.
It’s a good idea to keep an eye on how much traffic you’re getting from the search engines, and, for example, to make sure that your Google traffic doesn’t suddenly plummet. If it does, you’ll want to know about before it starts showing up in your sales figures.
You can also use your logs to see which of your keywords are more profitable. I guarantee they’re not all equal.
This information might also give you some good ideas for landing pages.
(3) Trends.
Every online business has traffic trends, and these are usually more complex than the basic 7 day pattern.
Use your logs to spot the quiet times, the busy times and the opportunities.
(4) Cracks and Hacks.
If you’re selling software online, these are unfortunately a part of life, and many software developers choose to ignore them.
But some of the crack sites are capable of generating a staggering amount of traffic. Ignoring them is one thing, but you don’t want to let a small number of sites slow down your server and affect your site’s performance.
Without your server log files, you won’t know about it until it’s too late. If ever.
(5) Identify your gold referrers.
You might well be surprised by who’s sending you traffic. You’d expect to see the search engines, directories and some of the larger software sites. But you might also find that a fairly inconspicuous link on an unknown sites (or sites) is sending you a lot of traffic.
If so, then take a look. Perhaps you might be able to further develop the relationship or even advertise on the site?
(6) Know thy browsers.
The old browser war was between Internet Explorer and Netscape, and we all know who won.
But there’s a new browser war going on, and one of the new contenders is becoming extremely popular.
Have you had a look at your website through Mozilla? Does it look the same? It probably doesn’t, and there may even be a few unpleasant surprises awaiting you.
But you’ll never know how much of an issue it is until you look at what browsers your visitors are using.
Your server logs are a gold mine of phenomenally useful and important information. Ignore them at your peril.