On Friday 15 August 2008 02:43:49 Kay Hayen wrote:
More importantly, and somewhat blocking my tests: With the improved
rules I
get this when compiling quite well reproducible:
type=SYSCALL msg=audit(1218773075.500:118620): arch=c000003e syscall=59
success=yes exit=0 a0=7fff6f78cf90 a1=7fff6f78cf40 a2=7fff6f78f068 a3=0
items=2 pp
id=11412 pid=11421 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295
comm="gcc-4.3"
exe="/usr/bin/gcc-4.3" key=(null)
[...]
type=SYSCALL msg=audit(1218773075.496:118624): arch=c000003e syscall=56
success=yes exit=11421 a0=1200011 a1=0 a2=0 a3=7fc067776770 items=0
ppid=11407 pid
=11412 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000
egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="gnatchop"
exe="/usr/b
in/gnatchop" key=(null)
Please note the _ascending_ sequence number but _descending_ time.
What this indicates is that there was some recursion before the syscall
triggered an event. The syscall context exists from sycall entry to exit. If
during the middle a signal is delivered, the syscall is not finished. Instead
it runs the signal handler associated with the signal. The signal handler
might make syscalls which are then handled using the existing syscall context
via linked list. When that occurs, the timestamp is not being updated. Not
sure that is appropriate or why the original time really mattered. But that
is what you are observing. My guess is SIGTERM is being delivered during
another syscall.
Seems like a bug? Can you have a look at it?
I'll check on why we don't update the time stamp during syscall recursion.
-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork
Plus I still did't fully grasp why that arch filter was necessary in the
first place. I mean, after all, I was simply expecting that per default no
filter should give all arches. Is that filter actually a selector?
The -F arch is a selector for the syscall table. The kernel works off of
numbers not strings. So, clone doesn't mean anything to the kernel, but 56
has meaning. 56 doesn't mean much to people. So, auditctl does you a favor of
converting text to numbers. It needs to know which table to choose from, the
32 bit or 64 bit table as both or one could be valid. Its possible to compile
the kernel to use only the 64 bit table. There is no way to detect this from
user space except by failure...in which case all you know is failure but not
why.
There is also not a direct mapping between x86_64 and i386. There are syscalls
that exist on one arch but not the other. There are syscalls that change
names between arches. The problem is that I could maintain a table of all
these cross references for x86_64 and i386, but I don't have a good idea
about ppc and s390 which are also biarch. Then the table would be a snapshot
in time. A syscall could get added in a later kernel but you won't get the
right results because you were trusting the tool and not suspcious enough to
do your own review.
Then there is a problem of correlation. If I have 1 rule that expands to 2,
then how can I do a compare of what's in memory vs what rules are on disk?
IOW, how do I tell that someone typed:
-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork
or just
-a entry,always -S clone -S fork -S vfork
because auditctl would make 2 from 1. This is a really tricky issue and if we
didn't care about correlation...or about outdated tools we trust too
much...we could do this.
Does it have to do with the fact that syscall numbers are arch
dependent?
Yes.
ausyscall x86_64 clone
56
ausyscall i386 clone
120
> > Can you confirm that a type=EOE delimits every event (is
that even
> > the correct term to use, audit trace, how is it called).
>
> It delimits every multipart event. you can use something like this to
>
> determine if you have an event:
> if ( r->type == AUDIT_EOE || r->type < AUDIT_FIRST_EVENT ||
>
> r->type >= AUDIT_FIRST_ANOM_MSG) {
> have full event...
> }
I will have to check if this affects our intended process tracing. The
parsing is certainly not simplified by it, for a possibly unrelated reason.
We have an audit parsing library. It takes this into account. the one and only
bug that I know of in it is when event records are interlaced. This is a
prolem you'll find at some point. Audit events and their records are not
serialized in the kernel. So, you could have:
syscall a
path a
syscall b
user msg c
cwd a
avc b
Without a very stateful message parser, one that e.g. knows how many
lines
are to follow an EXECVE, we don't know when to forward it the part that
should process it.
time->Thu Aug 14 08:21:34 2008
node=127.0.0.1 type=PATH msg=audit(1218716494.667:677): item=1
name="/home/sgrubb/.kde/share/config/kmailrc.lock3U3ZZa.tmp" inode=11304982
dev=08:03 mode=0100644 ouid=4325 ogid=4325 rdev=00:00
obj=unconfined_u:object_r:user_home_t:s0
node=127.0.0.1 type=PATH msg=audit(1218716494.667:677): item=0
name="/home/sgrubb/.kde/share/config/" inode=12550361 dev=08:03 mode=040700
ouid=4325 ogid=4325 rdev=00:00 obj=unconfined_u:object_r:user_home_t:s0
node=127.0.0.1 type=CWD msg=audit(1218716494.667:677): cwd="/home/sgrubb"
node=127.0.0.1 type=SYSCALL msg=audit(1218716494.667:677): arch=c000003e
syscall=87 success=yes exit=0 a0=15f06b0 a1=39609389d0 a2=1340ac0
a3=3960b67a70 items=2 ppid=1 pid=3432 auid=4325 uid=4325 gid=4325 euid=4325
suid=4325 fsuid=4325 egid=4325 sgid=4325 fsgid=4325 tty=(none) ses=1
comm="kontact" exe="/usr/bin/kontact"
subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key="delete"
Look at the syscall record. It is always emitted with multi-line records. It
has an items count. Each auxiliary (path in this case) record has an item
number. You can tell when you have everything. Single line entries do not
have an items field. Also note that the record comprising an event comes out
of the kernel in a backwards order.
What we first, once we got a message is the following code:
# 1. Some lines are split across multiple lines. The good thing is
that these never start
# with whitespace and so we can make them back into single
lines. This makes the next
# part easier.
lines = []
for line in message.split( "\n" ):
if line.strip() == "":
pass
elif line.startswith( " type=" ):
lines.append( line )
else:
assert line[0] != ' '
lines[-1] = lines[-1] + ' ' + line
Did you know about the audit parsing library?
This is in hope that indeed continued lines always start with a
non-space
and type lines always start with a space. Would you consider this format
worthy and possible to change?
Don't like changing formats as that affects test suites.
I have no idea how much it represents and existing external
interface, but
I can imagine you can't change it (easily). Probably the end of type= must
be detected by terminating empty line in case of those that can be
continued. But it would be very ugly to have to know the event types that
have this so early in the decoding process.
We have a parsing library, auparse, that handles the rules of audit parsing.
Look for auparse.h for the API.
> There might be tunables that different distros can used with
glibc.
> strace is your friend...and having both 32/64 bit rules if amd64 is the
> target platform.
We did that of course. And what was confusing us was that the audit.log did
actually seem to show the calls. Can that even be?
Yes, as explained above.
> > Does audit not (yet?) use other tracing interface like
SystemTap, etc.
> > where people try to have 0 cost for inactive traces.
>
> They have a cost. :) Also, systemtap while good for some things not good
> for auditing. For one, systemtap recompiles the kernel to make new
> modules. You may not want that in your environment. It also has not been
> tested for CAPP/LSPP compilance.
>
> > Also on a general basis. Do you recommend using the sub-daemon for the
> > job or should we rather use libaudit for the task instead? Any insight
> > is welcome here.
>
> It really depends on what your environment allows. Do you need an audit
> trail? With search tools? And reporting tools? Do you need the system to
> halt if auditing problems occur? Do you need any certifications?
I see. Luckily we are not into security, but only "safety". I can't find
anything on Wikipedia about it, so I will try to explain it briefly, please
forgive my limited understanding of it. :-)
At one point, I worked on Space Shuttle software. I know a little on how they
think about this.
It certainly will be very helpful to have the audit log and it
searchable
and I understand we get that automatic by leaving audit enabled, but
configured correctly. In the past we have disabled it, because it caused a
full disk and boot failure on RHEL 3 after only a month or so. I think it
complained about the UDP echo packets that we use to check our internal LAN
operations, but it could have been SELinux too.
RHEL3's audit system is completely different than RHEL5's.
> > 2. We don't want to poll periodically, but rather only
wake up (and
> > then with minimal latency) when something interesting happened. We
> > would want to poll a periodic check that forks are still reported, so
> > we would detect a loss of service from audit.
>
> You might write a audispd plugin for this.
Did you mean for the periodic check,
There is a realtime interface for the audit stream. You can write either a new
event dispatcher or a plugin to the existing one. Seeing as you are more
concerned with assurance, I'd just replace the current dispatcher with your
own. I have a description of this here:
http://people.redhat.com/sgrubb/audit/audit-rt-events.txt
or for the whole job, that means our supervision process?
The supervision process. Then again, maybe you want to replace the audit
daemon and handle events your own way. libaudit has all the primitives for
that. So, I guess that brings up the question of how you are accessing the
audit event stream. Are you reading straight from netlink or the disk?
Regarding performance I would like to say, you are likely right in
that
it's a non-issue. It has something of a bike-shed to me though. :-) I think
I still have
http://lwn.net/Articles/290428/ on my mind, where I had the
impression that kernel markers would only require a few noop instructions
as place holders for a jumps that would cause audit code to run.
You can go that way if you want. But I don't know of anyone else that has.
I was wondering why audit wouldn't use that. Is that historic
(didn't exist,
nobody made a patch for it) or conscious decision (too difficult, not worth
it). Just curious here and of course the comment could be read as a bit
scary, because it actually means we will have to benchmark the impact...
systemtap came after audit. They have 2 different purposes. One is
debugging/profiling, the other is regulatory compliance and security. The
system tap people have no gurantees about what kinds of data is contained in
the stream or the reliability of delivery. There was some talk about
combining hooks and in the end it was decided that we should leave them
disconnected as they serve entirely different purposes.
-Steve