log classification with liblognorm

Wednesday, April 6th, 2011

Today, we have added support for so-called “tags” to liblognorm (and it’s base library libee). This new capabilities permits very easy classification of syslog message and log records in general. So you can not only extract data from your various log source, you can also classify events, for example, as being a “login”, a “logout” or a firewall “denied access”. This makes it very easy to look at specific subsets of messages and process them in ways specific to the information being conveyed.

All details can be found at log classification with liblognorm.

normalizing Apache Access logs to JSON, XML and syslog

Wednesday, November 17th, 2010

We like to make our mind up based on examples, especially for complex issues. For a discussion we had on the CEE editorial board, we’d like to have some real-world example of a log file with many empty fields. An easy to grasp, well understood and easy to parse example of such is the Apache access log. Thanks to Anton Chuvakin and his Public Security Log Sharing Site we also had a few research samples at hand.

Apache common log format is structured data. So there is no point in running it through a free-text normalization engine like liblognorm. Of course it could process the data, but why use that complex technology. Instead, the decoder is now part of libee and receives a simple string describing which fields are present. It’s called like this:

$ ./convert  -exml -dapache -D “host identity user date request status size f1 useragent” < apache.org > apache.xml

Options specify encoder and decoder, and the string after -D tells the convert field names and order. But now let’s speak the input and output for itself:

The converter works by calling the decoder, which creates an in-memory representation of the log format in a CEE-like object model. Then, that object model and the called-for encoder is used to write the actual output format. That way, the conversion tool can convert between any structured log format, as long as the necessary encoders and decoders are available. This greatly facilitates log processing in heterogeneous environments.

Note that liblognorm works similar and, from libee’s point of view, can be viewed at as an decoder for unstructured text data (or, more precisely, for text data where the structure is well-hidden ;)).