.\"
.\" logpp (Log PreProcessor) 0.14 - logpp.man
.\" Copyright (C) 2006-2007 Risto Vaarandi
.\"
.\" This program is free software; you can redistribute it and/or
.\" modify it under the terms of the GNU General Public License
.\" as published by the Free Software Foundation; either version 2
.\" of the License, or (at your option) any later version.
.\"
.\" This program is distributed in the hope that it will be useful,
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
.\" GNU General Public License for more details.
.\"
.\" You should have received a copy of the GNU General Public License
.\" along with this program; if not, write to the Free Software
.\" Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
.\"
.TH logpp 1 "May 2007" "logpp 0.14"
.SH NAME
logpp \- log preprocessor
.SH SYNOPSIS
.TP
.B logpp
[-b <IO block size>]
.br
[-d]
.br
[-f <logging facility>]
.br
[-h]
.br
[-i <input buffer size>]
.br
[-l <logging level>]
.br
[-r <reopen interval>]
.br
[-s <sleep interval>]
.br
[-t <logging tag>]
.br
[-v]
.br
<config file> ...
.SH DESCRIPTION
Logpp is a tool for preprocessing event logs and feeding relevant 
information to other programs for storing or in-depth analysis. During 
its work, logpp reads lines appended to input files (like
.BR tail (1)
in
.B -f
mode), matches the lines with patterns (e.g., regular expressions), converts 
matching lines according to given templates, and writes the results to given 
destinations.
Logpp supports multi-line matching and several types of output destinations
like regular files, FIFOs, external programs, and the system logger.
Therefore, logpp can act as a filter in front of the more complex event log 
analysis system and increase the system's performance by weeding out 
irrelevant log data; it can work as a syslog gateway between the system 
logger and the application that doesn't use 
.BR syslog (3);
it can convert multi-line log messages to shorter single line messages, 
and accomplish other log preprocessing tasks.
.SH OPTIONS
.TP
.BR -b " N"
Logpp attempts to read N bytes at once and writes at most N bytes at once 
during all IO operations. This implies that lines from configuration and
input files longer than N bytes will be split. Also, the size of an output
message written to a destination can't exceed N bytes. 
N must be a positive integer and defaults to 8192.
.TP
.B -d
Logpp runs as a daemon and employs
.BR syslog (3)
for logging its own messages (otherwise the messages are written to
standard error).
.TP
.BR -f " S"
Logpp uses facility S for logging its own syslog messages (e.g., about its 
internal errors). 
S must be a string value (e.g.,
.IR local0 )
and defaults to
.IR user .
See
.BR syslog (3)
for valid facility values.
.TP
.B -h
print usage information.
.TP
.BR -i " N"
Logpp creates an input buffer of N lines for each input source. The buffer
contains last N lines that have been read from a given source, and in
order to match input from the source, the content of the buffer is compared 
with patterns. The user can write patterns for matching at most N lines.
N must be a positive integer and defaults to 10.
.TP
.BR -l " S"
Logpp uses verbosity level S for logging its own syslog/stderr messages - 
if the level of a message is lower than S, the message will not be logged. 
S must be a string value (e.g.,
.IR err )
and defaults to
.IR info .
See
.BR syslog (3)
for valid level values.
.TP
.BR -r " T" 
If an input or output file is not open (e.g., logpp failed to open it at 
startup or it was closed due to an IO error), logpp attempts to reopen the
file after T second intervals, until the open succeeds.
T must be a positive integer, the default behavior is "no reopen".
.TP
.BR -s " T"
If no data were read from input sources, logpp sleeps for T microseconds 
before attempting to read again.
T must be a positive integer and defaults to 1000000 (1 second).
.TP
.BR -t " S"
Logpp uses S as a tag (or "program name") for 
.I all 
syslog-style logging.
S must be a string value and defaults to
.IR logpp .
.TP
.B -v
print version information.
.SH CONFIGURATION FILE
Logpp configuration file consists of input, output, filter, and flow 
definitions. A line with a keyword, definition name, and an opening brace ({) 
starts a definition, and a line with a closing brace (}) ends the definition. 
Lines between the first and the last line are keyword-value pairs, with 
whitespace separating the keyword from the value. 
Lines which begin with an octothorpe (#) are treated as comments and ignored 
(whitespace may precede the octothorpe). Empty lines or lines consisting of
whitespace are also ignored.
.PP
The input definitions are used for setting up input sources for logpp.
Each definition starts with the
.B input
keyword and in the body of the definition, the
.B file <filename>
keyword-value pairs specify individual input sources which must be regular 
files, FIFOs, or standard input if <filename> is
.BR - .
During its work, logpp reads lines appended to regular file sources and 
written to FIFO sources and standard input. It stores the lines without 
terminating newlines to input buffers of corresponding sources, and processes 
the stored lines according to flow definitions (see below).
.PP
If the input file is recreated or truncated, logpp reopens the file and 
continues to read it from the beginning (i.e., it does not follow files
by i-node but rather by name). If an IO error occurs when reading from a file,
the file will be closed, but logpp attempts to reopen it if the
.B -r
command line option was given.
.PP
Here are example input definitions:
.PP
input var-log-messages {
.br
  file /var/log/messages
.br
}
.PP
input httpd-accesslogs {
.br
  file /var/log/httpd/access_log
.br
  file /var/log/httpd/ssl_access_log
.br
}
.PP
input var-cron-log {
.br
  file /var/cron/log
.br
}
.PP
The output definitions are used for setting up output destinations for logpp.
Each definition starts with the
.B output
keyword and in the body of the definition, the
.BR file " " <filename> ,
.BR syslog " " <priority> ,
and
.BR exec " " <commandline>
keyword-value pairs specify individual output destinations. 
The
.B file
keyword tells logpp to write its output to a file, the
.B syslog
keyword to log its output to the system logger, and the
.B exec
keyword to pipe its output to an external program.
.PP
The <filename> must be a regular file or FIFO. If <filename> is a regular
file, logpp writes to it in append mode; if <filename> does not exist at
logpp startup, it is created as a regular file.
If an IO error occurs when writing to a file, the file will be closed,
but logpp attempts to reopen it if the
.B -r
command line option was given.
.PP
The <priority> is a syslog 
.I facility.level 
pair (e.g., 
.IR mail.err )
that logpp will use when logging its output to the system logger.
The <commandline> is a command line that is executed as a separate process,
with its standard input connected to logpp
(for each output operation a new process is created).
.PP
Here are example output definitions:
.PP
output var-log-logpp {
.br
  file /var/log/logpp
.br
}
.PP
output syslog-warning {
.br
  syslog daemon.warning
.br
}
.PP
output syslog-crit-and-mail {
.br
  syslog auth.crit
.br
  exec /bin/mail -s "logpp message" root@localhost
.br
}
.PP
The filter definitions are used for setting up input matching and conversion
schemes for logpp. Each definition starts with the
.B filter
keyword and in the body of the definition, the
.BR regexp<num> " " <regular_expression> ,
.BR nregexp<num> " " <regular_expression> ,
.BR tvalue " " <truth_value> ,
and
.BR template " " <conversion_string>
keyword-value pairs define the matching and conversion scheme.
.PP
The 
.B regexp<num>
keyword is used for specifying a regular expression for matching <num> lines
(if <num> is omitted, it defaults to 1).
If <num> is 1, the last line from an input source is taken from the source's
input buffer and compared with the regular expression. 
If <num> is greater than 1, <num> last lines from an input source are taken
from its buffer, concatenated with the newline acting as separator between
lines, and the result is compared with the regular expression.
Thus, the
.B -i
command line option sets an upper limit for the value of <num>.
.PP
The
.B nregexp<num>
keyword is used for specifying a negative regular expression - the line(s)
is (are) considered matching if the expression itself does not match the 
line(s).
The truth value given with
.B tvalue
matches all lines if the value is
.IR true ,
and matches no lines if the value is
.IR false .
.PP
The
.B template
keyword defines a conversion string for the preceding
.BR regexp , 
.BR nregexp ,
or
.B tvalue
keyword. The conversion string may contain $<num> match variables that are
set by bracketing constructs inside the regular expressions. The $0 match
variable is set to the line(s) that took part in the matching operation.
Note that $0 is the only variable that is set for the
.B nregexp
and
.B tvalue
keywords.
.PP
The patterns given with
.BR regexp ,
.BR nregexp ,
and
.B tvalue
keywords are compared with the content of the input buffer in the order they 
are specified, and if a pattern matches, the search for further matches stops.
If the matching pattern has a conversion string, its match variables are 
subsituted with their values and the result is written to output destinations
given with flow definitions (see below). If there is no conversion string,
pattern produces no output and acts as a suppression condition.
.PP
Here are example filter definitions:
.PP
filter cisco-cpu {
.br
  # messages from device 192.168.1.111 are ignored
.br
  regexp 192\\.168\\.1\\.111
.br
  # cpu hog events from other devices produce output
.br
  regexp ([0-9\\.]+) [0-9]+: %SYS-3-CPUHOG
.br
  template Device $1 cpu hog
.br
}
.PP
filter cisco-link {
.br
  # link down events produce output
.br
  regexp ([0-9\\.]+) [0-9]+: %LINK-3-UPDOWN: Interface (.+), changed state to down
.br
  template Device $1 link $2 down
.br
  # link up events produce output
.br
  regexp ([0-9\\.]+) [0-9]+: %LINK-3-UPDOWN: Interface (.+), changed state to up
.br
  template Device $1 link $2 up
.br
}
.PP
filter httpd-php-access-192.168.0 {
.br
  # messages for other nets than 192.168.0 are ignored
.br
  nregexp ^192\\.168\\.0\\.
.br
  # PHP script accesses from 192.168.0 produce output
.br
  regexp ^([0-9\\.]+).*"GET (.+\\.php) HTTP/[0-9\\.]+"
.br
  template Host $1 accessed the PHP script $2
.br
}
.PP
filter cron-cmd-started {
.br
  # match cron "command started" messages that span over
.br
  # two lines and convert them into single line messages
.br
  # (the regular expression is written in Perl dialect)
.br
  regexp2 ^>\\s*CMD: (.+)\\n>\\s*(\\S+)\\s+(\\d+)
.br
  template Cron started command $1 (user $2 pid $3)
.br
}
.PP
filter 192.168.7.113 {
.br
  # lines with 192.168.7.113 are important in all logs
.br
  regexp 192\\.168\\.7\\.113
.br
  template $0
.br
}
.PP
The flow definitions are used for setting up processings flows for logpp.
Each definition starts with the
.B flow
keyword and in the body of the definition, the
.BR input " " <name> ,
.BR filter " " <name> ,
and
.BR output " " <name>
keyword-value pairs define the flow's inputs, matching and conversion 
schemes that are applied for inputs, and outputs where the results of the 
matching and conversion are written. The <name> parameter for all keywords
must be a name of the previously defined input, filter, or output. Note that
if more than one filter has been specified, a matching pattern in one filter 
does not prevent line(s) from being matched by patterns in other filters.
.PP
Here are example flow definitions:
.PP
# this flow accepts lines from /var/log/messages as input; 
.br
# it writes cisco "cpu hog" messages from other hosts than 
.br
# 192.168.1.111 to /var/log/logpp, and cisco "link down" and 
.br
# "link up" messages from all hosts to /var/log/logpp
.PP
flow cisco {
.br
  input var-log-messages
.br
  filter cisco-cpu
.br
  filter cisco-link
.br
  output var-log-logpp
.br
}  
.PP
# this flow accepts lines from httpd access logs as input;
.br
# it generates a syslog warning-level message when a PHP 
.br
# script is accessed from the 192.168.0 network
.PP
flow php {
.br
  input httpd-accesslogs
.br
  filter httpd-php-access-192.168.0
.br
  output syslog-warning
.br
}
.PP
# this flow accepts lines from cron daemon log as input;
.br
# it writes messages about started commands to /var/log/logpp
.PP
flow cron {
.br
  input var-cron-log
.br
  filter cron-cmd-started
.br
  output var-log-logpp
.br
}
.PP
# this flow accepts lines from /var/log/messages and httpd access
.br
# logs as input; it generates a syslog crit-level message and sends 
.br
# an e-mail to the local root user if a line with the IP address 
.br
# 192.168.7.113 appears in the logs
.PP
flow 192.168.7.113 {
.br
  input var-log-messages
.br
  input httpd-accesslogs
.br
  filter 192.168.7.113
.br
  output syslog-crit-and-mail
.br
}
.SH SIGNALS
.TP
.B SIGHUP
Logpp will close all inputs, outputs and the connection to the system logger,
reread the configuration and reinitialize itself.
.TP
.B SIGUSR1
Logpp will write its status information to syslog or stderr.
.TP
.B SIGUSR2
Logpp will reopen its outputs and the connection to the system logger.
.TP
.B SIGTERM
Logpp will terminate gracefully.
.SH NOTES
Logpp can be built with the support for Perl-compatible regular expressions
(described in
.BR perlre (1))
if the local system has the PCRE library (see
.BR pcre (3)).
.PP
If logpp has been built with the POSIX regular expression support, the 
regular expressions are compiled with REG_EXTENDED | REG_NEWLINE flags (see
.BR regcomp (3)
for details).
.PP
In order to prevent itself from blocking during calls to
.BR write (2), 
logpp opens FIFOs and pipes in non-blocking mode. If the consumer of
the FIFO or pipe is not reading the data fast enough,
.BR write (2)
to the FIFO or pipe will fail, and logpp will not attempt to write again.
.SH AUTHOR
Risto Vaarandi <ristov at users d0t s0urcef0rge d0t net>
.SH SEE ALSO
.BR pcre (3),
.BR perlre (1),
.BR regcomp (3),
.BR syslog (3),
.BR tail (1),
.BR write (2)
