Let me offer a design principle for tools that analyze audit logs, and
report their results by displaying audit records. Irrespective of the
contents of the audit log, these tools should generate a 7-bit ASCII
representation of each audit record.
Consider the poor guy accessing a computer with a terminal. If an
audit record contains binary data, and the person performs a query
using an audit tool, binary data in the answer could contain an escape
sequence that puts the terminal into a bazaar mode. This happens to
me when I connect to a Linux machine using putty, and read mail that
contains Chinese characters. Damn spam!
Binary data can occur in logs for unexpected reasons. For example, a
log file can become corrupted, or something that is not a log file can
accidentally be used as one. Furthermore, someone with bad intentions
can carefully add binary data designed to use terminal escapes to hide
their tracks.
Once one is carefully quoting field values, it becomes easy to offer
multiple formats. Let me propose two ASCII representations of audit
events, one that is very similar to what is produced by ausearch, and
a scripting language friendly version, in which each audit record is a
sequence of tab separated values.
In both formats, an audit event is started by a line of text with
three hyphen characters. In the tab separated values format, the
names and the values that make up a record are separated by a tab
character. Each name or value is quoted using the C string literal
syntax. Letters, digits, and space characters are formatted
unmodified. Characters that can be represented with character
escapes, such as the tab and newline characters, are formatted using a
character escape, with the exception of apostrophe and question mark,
which is formatted unmodified. Also formatted unmodified are the
graphics characters: !#%^&*(_)-+=~[]|;:{},.<>/. The remaining
characters are formatted using three digit octal numeric escapes.
In the ausearch-like format, each name is separated from its value
with an equal sign, and name-value pairs are separated by a space
character. A name or a value is formated unmodified if it contains
only characters that are formatted unmodifed in tab separated value
format, and do not contain an equal sign or a space character.
Otherwise, it is formated as in tab separated value format surrounded
by double quotes.
A name or value in tab separated value format is designed to be
scripting language friendly. For example in Python, if the variable
item contains a value, and it has a back slash, one obtains the binary
string it represents with the Python expression
eval('"' + item + '"', {}, {}).
Audit events represented as tab separated values are easily consumed
in Python. A simple loop does the job.
def filter():
seq = None # A sequence of tables representing an audit event
lineno = 0
seqno = 0
while True:
line = sys.stdin.readline()
if not line:
if seq:
consume(seq, seqno)
return
lineno = lineno + 1
if line == "---\n":
if seq:
consume(seq, seqno)
seq = []
seqno = lineno
continue
record = line.strip().split("\t")
nf = len(record) # number of fields
if nf % 2 != 0:
sys.stderr.write("Bad field count on line " + str(lineno) +
"\n")
sys.exit(1)
tab = {}
for i in range(0, nf, 2):
tab[record[i]] = record[i + 1]
seq.append(tab)
C applications can easily generate both formats if they use the
following interface to generate their output.
#if !defined EMIT_H
#define EMIT_H
/* The emitters generate tab separated values when the flag is
non-zero, otherwise name-value pairs are separated by an equal
sign. */
void set_tsv_mode(int flag);
/* Emit an event start marker, the string "---\n". */
void emit_start_event(void);
/* Emit an end of record marker, a newline character. */
void emit_record_end(void);
/* Emit the field separator, a tab character when in TSV mode,
otherwise a space character. */
void emit_field_separator(void);
/* Emit the name-value pair separator, a tab character when in TSV
mode, otherwise an equal sign character. */
void emit_name_value_separator(void);
/* Emit a name or a value. In TSV mode, the output is quoted using
the C string literal syntax. Letters, digits, and space characters
are emitted unmodified. Characters that can be represented with
character escapes, such as the tab and newline characters, are
printed using a character escape, with the exception of apostrophe
and question mark, which are emitted unmodified. Also emitted
unmodified are the graphics characters: !#%^&*(_)-+=~[]|;:{},.<>/.
The remaining characters are output using three digit octal numeric
escapes.
In non-TSV mode, a name or a value is emitted unmodified if it
contains only characters that are emitted unmodifed in TSV mode,
and do not contain an equal sign or a space character. Otherwise,
it is emitted as in TSV mode surrounded by double quotes.
A name or value emitted in TSV mode is designed to be scripting
language friendly. For example in Python, if the variable item
contains a value, and it has a back slash, one obtains the string
it represents with the expression eval('"' + item + '"', {},
{}). */
void emit_item(const char *bytes);
#endif
The file emit.c that implements this interfaces is available in the
polgen CVS repository on SourceForge.
John
Those are my principles. If you don't like them, I have others.
-- Groucho Marx