Convert Apache common logs to combined logs
February 15th 2007
I was setting up AWStats for my server a while ago and realized my Apache server log format was wrong for all my sites. It was in the “Common” format, which doesn’t include referrer or user-agent information. As soon as I caught that, I immediately switched the “Combined” log format, which gives all that information.
However, after switching formats, I didn’t want to lose all the old stats I had gathered. I had to find some way to convert the old Common-format Apache logs so that I could run my new log through AWStats without losing anything. Read on for how I did it…
The first thing I did was switch my Apache server to use the better more informative super-complete Combined log format. You can read about how to do that elsewhere, but my problem was that I had old Common log-format files around, and I wanted to count their hits!
So I have one log file, let’s call it access.log. It has both common-format and combined-format log entries (yeah, I should have rotated the log file when I made the change, but I didn’t). How the heck am I supposed to deal with that? It’s not too difficult, and can be done in a few simple commands (or one, if you’re a linux guru and can chain them together).
First I had to find when one started and the other ended, and that was done with this simple command:
cat zenphoto.log | grep -n -m 1 "\" \""
This is pretty clever, as it finds the first line with two quotes separated by a space, and it turns out that only happens in the Combined log entries. It prints the line number with the first matching line, in my case 844329.
We then want to split the file there, so we can work with just the common log entries.
csplit --prefix=access.log access.log 844329
This splits the file into two parts, access.log00 and access.log01, at the specified line number. 0 is our common and 1 is the combined log. Next we want to make our common log into a combined log, with blank information for the referrer and user-agent (since we can’t know them after the fact, unfortunately).
sed 's/$/ "-" "-"/' access.log00 > access.log00.fixed
We use the extremely versatile sed utility for this, which is sort of a command-line find-and-replace. We look for the end-of-line ($) and replace it with what we want after each line, which is two quoted entries with dashes, the log format notation for “unknown” in the referrer and user-agent columns.
Next we put them back together simply into our new complete combined log. It’s got lots of missing information, but it’s better than nothing.
cat access.log00.fixed access.log01 > access.log.combined
If you already ran awstats once for this site, you probably have to delete the text data files in your awstats DataDir (/var/log/awstats for me). Otherwise it won’t count any of the new records (Thanks to Chris for the suggestion).
You can then run awstats with your new custom log file like this:
/usr/lib/cgi-bin/awstats.pl -config=your.domain.com -LogFile=access.log.combined
If it’s taking forever to run, you probably need to turn off reverse DNS lookups by setting DNSLookup=0 in your site’s config file. Here’s a good explanation of why you want to turn that off and use something like GeoIP instead.
That’s it! Awstats should update, and you should be good from there on out with the new combined log format. You may want to continue by adding a cron job to automate the updates. Have fun looking through your stats!








I just installed AWStats this evening and had the same dilemma as you, but found your excellent steps after a quick google search. You might want to mention that (depending on how you have AWStats configured), you may have to delete your existing stats in your DirData directory (probably /var/lib/awstats) and run it again for a full update so it doesn’t skip lines it has already processed. Otherwise, thanks a lot for the help – I’m off to convert all my old logs!
Thanks Chris, I’ve updated the article with your suggestion and fixed other minor errata.
Glad I could help! I couldn’t find any other articles about this anywhere, so I made sure to title my post in a Google-friendly manner
Seems like it worked!
Oh, thank goodness you wrote this. Super fantastically useful. Exactly what I was looking for.
I had to sort my merged log first, because everything was out of order (happened when I last moved server and took the logs with me, I think). Sorting Apache logs by date is rather nasty.
But once I had that done, this recipe saved my bacon.
great post ..tank you very much !
[...] process logs: awstats.pl -config=www.mysite.com -update (NOTE: logs should be in combined format -> set in httpd.conf. Cool post about converting common to combined log format) [...]
Did you mean:
csplit -f access.log access.log 7611828
Your csplit gives an error in in terminal on Leopard.
That’s not actually what I meant, I was referring to the Linux/Unix version of the csplit command in which—prefix is used.
Thanks for the Leopard tip, seems like the OSX version of csplit isn’t directly compatible with the Linux version. Doesn’t mean it’s the correct one