Hello Steve,
thanks for your reply, very helpful. :-)
> Now one issue, I see is that the times that we get from auditd
through
> the socket from its child daemon may not match the start_date exactly.
All time hacks in the audit logs come from the kernel at the instant the
record is created. They all start by calling audit_log_start, and right
here is where time is written:
http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1194
The source that is used is current_kernel_time();
[...]
The audit system has millisecond resolution.This was considered
adequate
due to system ticks being < 1000 Hz. The current_kernel_time90 is a broken
down time struct similar to pselect's. This is how its used:
audit_log_format(ab, "audit(%lu.%03lu:%u): ",
t.tv_sec, t.tv_nsec/1000000, serial);
From what I checked, it seems that current_kernel_time() is indeed fed
exactly
by the jiffy/system ticks since boot (at least did I find comments that
suggest so). I still have to verify how it is translated. There still is the
issue of translating ticks into seconds since 1970. So far I have only
achieved to get hands on system boot time in a granularity of one second.
I have no clue if that is the same time used in the kernel to offset the ticks
value. I will make a delta test once I can. But suspect would be that the
real time clock has better time than one second, which is all that I get
from /proc/stat btime: field.
More importantly, and somewhat blocking my tests: With the improved rules I
get this when compiling quite well reproducible:
type=SYSCALL msg=audit(1218773075.500:118620): arch=c000003e syscall=59
success=yes exit=0 a0=7fff6f78cf90 a1=7fff6f78cf40 a2=7fff6f78f068 a3=0
items=2 pp
id=11412 pid=11421 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295
comm="gcc-4.3"
exe="/usr/bin/gcc-4.3" key=(null)
[...]
type=SYSCALL msg=audit(1218773075.496:118624): arch=c000003e syscall=56
success=yes exit=11421 a0=1200011 a1=0 a2=0 a3=7fc067776770 items=0
ppid=11407 pid
=11412 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000
egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="gnatchop"
exe="/usr/b
in/gnatchop" key=(null)
Please note the _ascending_ sequence number but _descending_ time. This is
pasted from my audit.log and quite surprising. It triggers an assertion in
our code, because we also seem receive things in the wrong order from the
socket. There was not FORK before the EXECVE that refers to the pid. We
tolerate that (obviously there are going to be processes that we didn't see
fork once we start, we don't do an initial scan yet), but we don't tolerate
that fork returns an existing pid.
Seems like a bug? Can you have a look at it? I was using for that:
-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork
-a entry,always -S execve
-a entry,always -S exit_group -S exit
I didn't apply the arch to exit, etc. yet, but there wasn't an pid rollover
yet, so I don't think that missing an exit is really the issue here.
Plus I still did't fully grasp why that arch filter was necessary in the first
place. I mean, after all, I was simply expecting that per default no filter
should give all arches. Is that filter actually a selector? Does it have to
do with the fact that syscall numbers are arch dependent?
> The other thing is sequence numbers. We see in the output
sequence
> numbers for each audit event. Very nice. But can you confirm where these
> sequence numbers are created? Are they done in the kernel, in auditd or
> in its child daemon?
They are done in the kernel and are incremented for each audit_log_start so
that no 2 audit events within the same millisecond have the same serial
number. Their source is here:
http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1085
Thanks for the pointer. That looks indeed like nothing can go wrong on the
reliability side of time and serial. The more astounding my report from above
is.
> The underlying question is, how safe can we be that we
didn't miss
> anything when sequence numbers don't suggest so. We would like to use the
> lossless mode of auditd. Does that simply mean that auditd may get behind
> in worst case?
Yes. You would want to do a couple things. Increase the kernel backlog,
increase auditd priority, & increase audispd's internal queue.
That's fine with us. We just wouldn't want to be in an inconistent state after
a load peek. I think with a decicated core for our supervision processes,
everything except benchmarks of clone performance (aka fork bombs) will be
able to trigger any actual issue.
> Can you confirm that a type=EOE delimits every event (is that
even
> the correct term to use, audit trace, how is it called).
It delimits every multipart event. you can use something like this to
determine if you have an event:
if ( r->type == AUDIT_EOE || r->type < AUDIT_FIRST_EVENT ||
r->type >= AUDIT_FIRST_ANOM_MSG) {
have full event...
}
I will have to check if this affects our intended process tracing. The parsing
is certainly not simplified by it, for a possibly unrelated reason.
We read from a socket, and in data chunks, which are then processed. For that,
we want to identify the end of a message before processing it. Right now, we
have switched to waiting for type=EOE and its new line. Without that, the
fact that type=non-EOE may be a multi-line thing. I think parameters are on
new lines and that makes it a bit hard to detect the end of a complete type=
trace. I don't see that in the audit.log, so probably it's a problem of the
sub-daemon:
type=EXECVE msg=audit(1218774568.173:129440): argc=4 a0="/bin/sh"
a1="..."
a2="..."
a3="..."
Without a very stateful message parser, one that e.g. knows how many lines are
to follow an EXECVE, we don't know when to forward it the part that should
process it.
What we first, once we got a message is the following code:
# 1. Some lines are split across multiple lines. The good thing is
that these never start
# with whitespace and so we can make them back into single lines.
This makes the next
# part easier.
lines = []
for line in message.split( "\n" ):
if line.strip() == "":
pass
elif line.startswith( " type=" ):
lines.append( line )
else:
assert line[0] != ' '
lines[-1] = lines[-1] + ' ' + line
This is in hope that indeed continued lines always start with a non-space and
type lines always start with a space. Would you consider this format worthy
and possible to change?
I have no idea how much it represents and existing external interface, but I
can imagine you can't change it (easily). Probably the end of type= must be
detected by terminating empty line in case of those that can be continued.
But it would be very ugly to have to know the event types that have this so
early in the decoding process.
> BTW: Release-wise, will RHEL 5.3 include the latest auditd?
That is the plan. But there will be a point where audit development
continues and bugfixes are backported rather than new version. At a
minimum, audit-1.7.5 will be in RHEL5.3. Maybe 1.7.6 if we have another
quick release.
That's OK for us. And until then self-compiled will do.
If you are on amd64, I would suggest:
-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork
Actually that solved the trouble of not seeing anything on "Debian". The fact
is that we are using RHEL x86 for production and Debian (or Ubuntu) amd64 on
our development machines. So we never checked RHEL amd64, which likely would
show the same thing.
But see above, this possibly a bug?
There might be tunables that different distros can used with glibc.
strace
is your friend...and having both 32/64 bit rules if amd64 is the target
platform.
We did that of course. And what was confusing us was that the audit.log did
actually seem to show the calls. Can that even be?
> Does audit not (yet?) use other tracing interface like
SystemTap, etc.
> where people try to have 0 cost for inactive traces.
They have a cost. :) Also, systemtap while good for some things not good
for auditing. For one, systemtap recompiles the kernel to make new modules.
You may not want that in your environment. It also has not been tested for
CAPP/LSPP compilance.
> Also on a general basis. Do you recommend using the sub-daemon for the
> job or should we rather use libaudit for the task instead? Any insight is
> welcome here.
It really depends on what your environment allows. Do you need an audit
trail? With search tools? And reporting tools? Do you need the system to
halt if auditing problems occur? Do you need any certifications?
I see. Luckily we are not into security, but only "safety". I can't find
anything on Wikipedia about it, so I will try to explain it briefly, please
forgive my limited understanding of it. :-)
Increasingly we in the ATC need to provide a "Software Assurance Level" (SWAL)
to our customers and those to their regulators. That's one of many standards
to define "safety". Depending on the level, you may not only need to document
the requirements of your software or you have to relate each line of code
with them and ultimately prove the correctness of compiler output. I believe
they do this for fighter planes.
We currently target level 3, which means that we will have to prove that we
have concerned ourselves with all possible hazards and their combinations and
if that would lead to "safety" relevant things. For the running system, we
don't have to prove anything. There are "legal recordings" of the input and
output data, but no complete logs of the operation are required per se,
although of course, as we support the analysis of trouble reports, that is
ver welcome. That level is acceptable for non-airborne systems.
So if we are running and monitoring a system, and if e.g. certain process must
run to prevent planes from colliding, we must provide explanations (and
tests) of why we are at any time sure _if_ (not "that") the process is
running or how the problem will be noted by the software and ultimately the
operators through other means, if that fails.
On a general level, redundancy is normally the solution to hazards. Everything
is doubled, and you normally have 2 trackers, etc. so you use not only one
software, but multiple implementations.
When something fails, we generally stop the software unless we are sure it's
an isolated event. That is because data corruption is worse than no data at
all. An unseen flight or a falsely reported flight is a lot worse than a
crashed system that only needs a restart and will be back in operation within
time limits.
Our concern is not at all to prevent or detect malicious use of the system. If
somebody wants to run a binary in a hidden way and takes effort to achieve
that, other processes have failed already. Our systems always run in an
environment where security is already solved through other means. The systems
e.g. do allow rsh login, without a password, as root. I hope you can continue
breathing now. :-)
It certainly will be very helpful to have the audit log and it searchable and
I understand we get that automatic by leaving audit enabled, but configured
correctly. In the past we have disabled it, because it caused a full disk and
boot failure on RHEL 3 after only a month or so. I think it complained about
the UDP echo packets that we use to check our internal LAN operations, but it
could have been SELinux too.
So, what we will do with audit is to look at source code, identify design
principles that make errors impossible, write that down, define our
requirements of it, test these. And ultimately we may have to provide an
alternative implementation without audit at all, with probably worse latency,
and the ability to detect and compensate (simulated) audit bugs. And test
that as well.
And in some figure, we will say: If the tracker process exits, we will detect
that on avarage after 1 ms, and at latest after 50ms. And that will play a
role in the time it takes to switch to the standby tracker process on another
hardware. And that will e.g. have the requirement to be able to take over in
1 or 3 seconds, and that will be possible, because the process that hands
over started early enough and takes itself a limited time.
The ultimate benefits of auditd would be to us:
a) Achieve very low values in latency, being able to observe an "exit" as it
happens, not just when a periodic timer makes a test. We can only achieve
these kinds of performance through child process SIGCHLD so far and signals
are not an ideal interface. Being able to monitor non-childs is very good for
us, actually what made us interested in audit in the first place.
b) Robust operation. In theory the audit approach should be able to receive
assurances that certain hazards just cannot happen that's very nice and
certainly increases the assurances we can give for our process supervision, a
key stone building part for every system.
c) Our middleware may suddenly offer to monitor processes like ntpd, cron
jobs, etc. very easily and without system changes. There are systems that
work with "ps", "grep" and "kill" to achieve this, but as
you can imagine,
that only goes so far.
Security, in the sense of intrusion detection, authorization, etc. doesn't
play a role to us. So we don't need audit trails for our live supervision,
we "only want to know" what was running, what is running.
> 2. We don't want to poll periodically, but rather only wake
up (and then
> with minimal latency) when something interesting happened. We would want
> to poll a periodic check that forks are still reported, so we would
> detect a loss of service from audit.
You might write a audispd plugin for this.
Did you mean for the periodic check, or for the whole job, that means our
supervision process? We certainly would prefer the plugin approach for such a
test, esp. if there is hope that you accept it into your code.
The closing of the socket in case of loss of service would be sufficient
signal to us.
Regarding performance I would like to say, you are likely right in that it's a
non-issue. It has something of a bike-shed to me though. :-) I think I still
have
http://lwn.net/Articles/290428/ on my mind, where I had the impression
that kernel markers would only require a few noop instructions as place
holders for a jumps that would cause audit code to run. I was wondering why
audit wouldn't use that. Is that historic (didn't exist, nobody made a patch
for it) or conscious decision (too difficult, not worth it). Just curious
here and of course the comment could be read as a bit scary, because it
actually means we will have to benchmark the impact...
Best regards,
Kay Hayen