« archives

February 2007
S M T W T F S
« Jan   Mar »
 123
45678910
11121314151617
18192021222324
25262728  

recently

news from around the web

» view all

Convert Apache common logs to combined logs

February 15th 2007

I was setting up AWStats for my server a while ago and realized my Apache server log format was wrong for all my sites. It was in the “Common” format, which doesn’t include referrer or user-agent information. As soon as I caught that, I immediately switched the “Combined” log format, which gives all that information.

However, after switching formats, I didn’t want to lose all the old stats I had gathered. I had to find some way to convert the old Common-format Apache logs so that I could run my new log through AWStats without losing anything. Read on for how I did it…

The first thing I did was switch my Apache server to use the better more informative super-complete Combined log format. You can read about how to do that elsewhere, but my problem was that I had old Common log-format files around, and I wanted to count their hits!

So I have one log file, let’s call it access.log. It has both common-format and combined-format log entries (yeah, I should have rotated the log file when I made the change, but I didn’t). How the heck am I supposed to deal with that? It’s not too difficult, and can be done in a few simple commands (or one, if you’re a linux guru and can chain them together).

First I had to find when one started and the other ended, and that was done with this simple command:

cat zenphoto.log | grep -n -m 1 "\" \""

This is pretty clever, as it finds the first line with two quotes separated by a space, and it turns out that only happens in the Combined log entries. It prints the line number with the first matching line, in my case 844329.

We then want to split the file there, so we can work with just the common log entries.

csplit --prefix=access.log access.log 844329

This splits the file into two parts, access.log00 and access.log01, at the specified line number. 0 is our common and 1 is the combined log. Next we want to make our common log into a combined log, with blank information for the referrer and user-agent (since we can’t know them after the fact, unfortunately).

sed 's/$/ "-" "-"/' access.log00 > access.log00.fixed

We use the extremely versatile sed utility for this, which is sort of a command-line find-and-replace. We look for the end-of-line ($) and replace it with what we want after each line, which is two quoted entries with dashes, the log format notation for “unknown” in the referrer and user-agent columns.

Next we put them back together simply into our new complete combined log. It’s got lots of missing information, but it’s better than nothing.

cat access.log00.fixed access.log01 > access.log.combined

If you already ran awstats once for this site, you probably have to delete the text data files in your awstats DataDir (/var/log/awstats for me). Otherwise it won’t count any of the new records (Thanks to Chris for the suggestion).

You can then run awstats with your new custom log file like this:

/usr/lib/cgi-bin/awstats.pl -config=your.domain.com -LogFile=access.log.combined

If it’s taking forever to run, you probably need to turn off reverse DNS lookups by setting DNSLookup=0 in your site’s config file. Here’s a good explanation of why you want to turn that off and use something like GeoIP instead.

That’s it! Awstats should update, and you should be good from there on out with the new combined log format. You may want to continue by adding a cron job to automate the updates. Have fun looking through your stats!


This entry was posted on Thursday, February 15th, 2007 at 4:01 pm and is filed under Apache, Technical, linux. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.


7 Responses to “Convert Apache common logs to combined logs”



  1. Chris Commented at 7:56 pm on February 19th 2007

    I just installed AWStats this evening and had the same dilemma as you, but found your excellent steps after a quick google search. You might want to mention that (depending on how you have AWStats configured), you may have to delete your existing stats in your DirData directory (probably /var/lib/awstats) and run it again for a full update so it doesn’t skip lines it has already processed. Otherwise, thanks a lot for the help – I’m off to convert all my old logs!

  2. Tristan Commented at 11:20 pm on February 19th 2007

    Thanks Chris, I’ve updated the article with your suggestion and fixed other minor errata.

    Glad I could help! I couldn’t find any other articles about this anywhere, so I made sure to title my post in a Google-friendly manner :-) Seems like it worked!

  3. Dom Commented at 4:57 am on March 14th 2007

    Oh, thank goodness you wrote this. Super fantastically useful. Exactly what I was looking for.

    I had to sort my merged log first, because everything was out of order (happened when I last moved server and took the logs with me, I think). Sorting Apache logs by date is rather nasty.

    But once I had that done, this recipe saved my bacon. :-)

  4. Geo Commented at 12:25 am on September 10th 2007

    great post ..tank you very much !

  5. BrnDmp » Blog Archive Pingbacked at 2:09 am on November 9th 2007

    [...] process logs: awstats.pl -config=www.mysite.com -update (NOTE: logs should be in combined format -> set in httpd.conf. Cool post about converting common to combined log format) [...]

  6. jack Commented at 12:31 am on December 28th 2007

    Did you mean:

    csplit -f access.log access.log 7611828

    Your csplit gives an error in in terminal on Leopard.

  7. Tristan Commented at 10:54 am on December 29th 2007

    That’s not actually what I meant, I was referring to the Linux/Unix version of the csplit command in which—prefix is used.

    Thanks for the Leopard tip, seems like the OSX version of csplit isn’t directly compatible with the Linux version. Doesn’t mean it’s the correct one ;-)

Leave a Reply

Some XHTML allowed.