The great thing about knowing how to code is being able to build things that help you out.
My first approach was to hack together some macros and just plow through the logs (fortunately there weren't that many), but this had several downsides:
- by using macros I had to take out certain data which could be used to generate statistics
- it's a manual job and doesn't scale well time-wise, despite being able to automate it to some extent
- can't be easily shared with others
After finishing parsing the logs I decided that it's best to make a learning experience out of it and code a Python script that goes through all the logs and generates the correct output; I knew it wasn't a difficult task (heck, I've thrown it together in around 3 hours of coding and debugging overall) so all I had to do was get cracking.
The largest issue I had with it was regarding the way Python handles regex substitution.
Whoever makes me understand why this Python regex substitution doesn't work gets a cookie and my eternal gratitude. bit.ly/GP8lM3— Serban Constantin (@fuzzmz) March 27, 2012
Thanks to the great wonder that is the Internet I could get an answer to my question in a couple of hours and continue on my marry way.
D'oh, strings in Python are immutable so my re.sub doesn't modify the string in place which means I need to, you know, save the result!— Serban Constantin (@fuzzmz) March 27, 2012
Another nice tip I got was to pre-compile the regex before going through the loop which speeds up things considerably when dealing with lots of text. It was as simple as a:
import re talk_mask = re.compile('\!.*?\:') connect_mask = re.compile('\!.*?\has') quit_info = re.compile('\quit.*?\]') find_nick = re.compile('\<* .*?\!') time_mask = re.compile('([0-1]\d|2[0-3]):([0-5]\d):([0-5]\d)')
I've of course made the code public. It can be found and downloaded from GitHub which is also the place to report bugs or make suggestions.