John Dennis wrote:
I also agree the data stream which emerges from audit is rather
difficult to work with. Eric likes to point out we can't change the
kernel, so maybe what we really need (and has been proposed) is for
auditd to reformat the data before emitting it or writing it do disk
(e.g. assemble records into events, decode strings which have been
hexified, etc.) Currently auparse is responsible for much of this as
part of a post processing step which has to be repeated every time audit
data is read instead of just once as it emerges from the kernel. If
instead the auparse user level code was folded into auditd which then
became responsible for formatting the ad hoc data received from the
kernel the final output from audit could be much more friendly and much
of the rationale for auparse would evaporate.
I was going to request going the other way with libauparse, i.e. to
entirely separate it from auditd. As I mentioned, I'm not using auditd
because it wasn't really written with my customer's requirements in mind
(high volume, no local storage). My audit daemon needs to run on RHEL 3
(it has a LAuS backend too) and RHEL 4. I don't see anything
architecturally which ties libauparse to auditd, so if it was a separate
library I could recompile it for RHEL 4 without replacing the RHEL 4
audit-libs, etc. I can certainly see the efficiency in auditd parsing
data before handing it off to dispatchers, but it's not hard to
construct non-auditd uses for it either. Of course, it would need some
performance work first for my use case, but I wouldn't want to duplicate
the effort unnecessarily.
On the more general topic of the format of data emitted by the kernel, I
see 2 serious threads of problem presented by the above, and by the
current solution (even though they are currently the most pragmatic):
1. libauparse only exists to reverse engineer a really bad protocol.
2. The existing protocol has already broken userspace many times.
On that second point, the changes since the protocol was introduced
(pre-git history, so I can't work out when) have been such that any tool
written at the time of 2.6.12 couldn't possibly expect to continue to
function correctly if you updated the kernel underneath it. Some examples:
bccf6ae083318ea08094d6ab185fdf7c49906b3a
"audit_rate_limit=%d old=%d by auid %u" -> "audit_rate_limit=%d old=%d
by auid=%u"
9e45eeac867d51ff3395dcf3d7aedf5ac2812c8
Add escaping to comm field
a6c043a887a9db32a545539426ddfc8cc2c28f8f
Add tty field without quotes or escaping of value
ac03221a4fdda9bfdabf99bcd129847f20fc1d80
Remove qbytes field from IPC record
Change iuid, igid field names
5b9a4262232d632c28990fcdf4f36d0e0ade5f18
Convert some hex IPC records to octal
de6bbd1d30e5912620d25dd15e3f180ac7f9fcef
Change to format of EXECVE messages
Auditd only continues to function because it has been updated in step
with the kernel: it is 'special'. Upstream's opinion on this is fairly
clear. Note this isn't an argument in favour of a binary format
specifically (although I favour that for efficiency), but it does
highlight the requirement for a new, well-designed format.
Matt
--
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services
M: +44 (0)7977 267231
GPG ID: D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490