mailing list archives
Re: parsing logs ultra-fast inline
From: Chuck Swiger <chuck () codefab com>
Date: Sat, 04 Feb 2006 13:32:16 -0500
Adrian Grigorof wrote:
What do we want to know?
The compilation of the most popular reports that we would like to see after
a firewall (or other similar device) log analysis - from a thread initiated
by mjr in the Log Analysis mailing list.
Is that mailing list public? I took a quick look around the loganalysis.org
site, but didn't notice anything.
I noticed that there is a big emphasis on log parsing while there should be
more discussions about the interpretation of the log parsing results. I've
worked with logs from quite a few types of firewalls but parsing them has
never been the problem. Yes, is a tedious, frustrating job but a rather easy
one in comparison with the task of "programmatically" interpreting their
You're right, but then assigning meaning to something is hard for software to
do. Most of the time, software relies on hardcoded tests which enforce
predefined meanings which were assigned by the developer when the software was
It's possible to do better via dynamicly updateable signatures or patterns
(virus scanners, snort, etc), or inference/knowledge-base systems which can draw
new conclusions as more data is acquired or learned over time. The downside of
these is that they tend to be pretty slow: you can easily burn a half-hour just
doing a virus scan on a potentially suspect machine's 20GB hard drive.
Take Tina's VPN example - how many types of log entries you would
expect from a VPN concentrator? From my experience, not more than 20 but
let's assume there are 50.
Which VPN software are we talking about?
% grep printf openvpn-2.0.5/**/*.c | wc -l
If you wanted to say there were only 20 of 50 commonly encountered messages,
that appears more reasonable.
Give a sample from each entry to a Perl
programmer and you will have the parsing script done in a day or two. So now
you have the data, but what are doing with it? What is relevant to a VPN
administrator? Even a seasoned security professional would appreciate some
"conclusions" that a reporting tool would provide from the data in the logs.
These are good questions. If software logged information in a more consistent
and usable format, we might be able to understand what it is saying better.
Something like the OpenBSM framework and logfile format are becoming more
popular, and this ties in well with Orange Book C2 and Common Criteria CAPP
That being said, I agree that when you have to analyze 100 GB worth of logs,
parsing them becomes a (big) problem and you need to optimize as much as
possible. Actually, a "mere" 1 GB log is a show stopper for many analyzers
on the market.
Unfortunately true. Without fighting too hard, many log analysis tools for
things like webserver or squid or firewall rules seem to process ~10K lines or
events per second, which works out to a gigabyte every ten minutes or so,
whereas other tools seem hopelessly incapable of handling large data sets.
firewall-wizards mailing list
firewall-wizards () honor icsalabs com