Steve,
It was nice to meet you. Since we met, I installed FC4 and have been
playing with parts of the audit system, especially with ausearch.
As you know, a modified version of strace currently provides input for
our policy generation tool, however, once the audit system provides
better descriptions and security context information in its syscall
event records, we hope to be able to switch to ausearch as our source
of input.
One difficulty with using strace is that its output was designed to be
human readable at the expense of being difficult for a machine to read
it. I would like to see an output mode added to ausearch designed be
easily read by programs, and I am willing to contribute the code that
implements this mode.
There are many choices available for machine readable output. Let me
list four.
Since our programs are written in Python, the simplest syntax for us
is to write each record as a Python dictionary. If need be, a value
associated with a key in a record may also be a dictionary. If each
record is preceded by the string "aurec(" and ended by ")", a Python
program intended to consume the output simply provides a definition
for aurec, and then performs an execfile on the output generated by
ausearch. The execfile will cause aurec to be called with each record
as its argument.
For Python, the record:
time->Mon Jun 20 09:28:51 2005
type=SYSCALL msg=audit(1119274131.024:13634907): arch=40000003 syscall=6 success=yes
exit=0 a0=3 a1=bf968e8a a2=bf968e8a a3=bf968f28 items=0 pid=9408 auid=4294967295 uid=0
gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 comm="autrace"
exe="/sbin/autrace"
would appear as:
aurec({'time':'Mon Jun 20 09:28:51 2005',
'type':'SYSCALL',
'msg':{'audit':'1119274131.024:13634907',
'arch':40000003,
'syscall':6,
...
'exec':'/sbin/autrace'}})
While the Python solution is easy, it leaves users of other languages
in the cold. At the other extreme, we could use XML syntax, as most
systems have an XML parser. The trouble with XML is that it is too
verbose and hard to read. Surely there must be an intermediate
solution.
JSON (JavaScript Object Notation) is a lightweight data-interchange
format <
http://www.json.org>. It is easy for humans to read and write,
and it is easy for machines to parse and generate. An advantage of
JSON is that parsers and printers for many languages are available now.
The example written in JSON looks like:
{"time":"Mon Jun 20 09:28:51 2005",
"type":"SYSCALL",
"msg":{"audit":"1119274131.024:13634907",
"arch":40000003,
"syscall":6,
...
"exec":"/sbin/autrace"}}
In JSON, it's kind of a pain that every string must be quoted. My
final suggestion for machine readable syntax is to use an ausearch
specific one, based on JSON. Whenever a string contains either a C
identifier or a number, I suggest we allow it to appear unquoted.
With this change, the example is quite readable, something like:
{time:"Mon Jun 20 09:28:51 2005",
type:SYSCALL,
msg:{audit:"1119274131.024:13634907",
arch:40000003,
syscall:6,
...
exec:"/sbin/autrace"}}
I hope this message inspires more ideas on this topic.
John