No subject is more controversial to a group of web professionals than Web Statistics. The advertising industry is still a little sore with us after we promised early on that Web Stats would give them all that invaluable information they could never get from TV, Radio or Print. This was not a lie, per se, as some sectors are able to mine tremendous amounts of quality information from their web traffic, session login, and cookie data. For most of us, however, the reality has fallen far short of the promise.
If you are a content provider and your business depends of attracting advertisements online, this subject surly has been discussed. The problem, in short, is there are dozens of voices who want to tell us what our websites ‘legitimate’ traffic is. Javascript tagging sites (Google, Omniture) are competing with toolbar/panel companies (Alexa, Comscore) who are competing with ISP traffic monitors (Hitwise) who are all competing with web server log analyzer software apps (Web Trends, Net Tracker, Urchin). Reasons vary as to why each stats provider will defend the very different traffic numbers that they are reporting. The short of it being, like all complex problems, there are both plenty of causes and plenty of proposed solutions.
One was to navigate these waters is to look at the numbers for your site from as many places as possible and use them not as an absolute but as a relative value. Track them against each other; use a simple spreadsheet. Then, when one shows a jump or a drop that isn’t reflected in the others, you have something to back your healthy skepticism. Dig deeper into that jump or drop, you may find that it was completely unrelated to your sites performance.
Google Analytics keeps popping up in the industry as a believable standard. This is tough for the content provider when Google is showing numbers that are much lower than your log analyzer and the advertiser knows it. One method to combat this is to use Urchin locally. Urchin is the application Google Analytics uses, so at least we are on the same playing field. Google uses apache server logs to monitor javascript trackers and small image loads for your site. You can simulate these in your environment as well, its not too tough. Once setup, Urchin numbers will differ from Googles only in what gets stripped by bot filters. Most bots identify themselves as friendly, so they are easily stripped. After that, it is a constant loop of maintenance. I suggest these steps:
- Remove the obvious bots with a preroll filter
- Establish your baseline for your site. That is what you are seeing in your reports today
- Establish a trend relationship for the previous 12 months. This is useful to identify hot and cold spots in traffic tied to the calendar
- Review stats daily/weekly for outlyers. High Pageviews is an indication that someone or something is blowing threw all your content automatically
- Create monitoring tools to notify you when numbers are out of threshold
I am sorry to have to say that loading Bob’s LogReedr on the web server and forgetting about it isn’t the solution we predicted would ‘revolutionize the advertising industry’. The bottom line is, trust should not come easy. It will take some work before you should trust any number you get regarding Web Stats.