On Saturday 16 August 2008 07:19:27 Kay Hayen wrote:
[ time descending, sequence number ascending problem ]
> What this indicates is that there was some recursion before the syscall
> triggered an event. The syscall context exists from sycall entry to exit.
> If during the middle a signal is delivered, the syscall is not finished.
> Instead it runs the signal handler associated with the signal. The signal
> handler might make syscalls which are then handled using the existing
> syscall context via linked list. When that occurs, the timestamp is not
> being updated. Not sure that is appropriate or why the original time
> really mattered. But that is what you are observing. My guess is SIGTERM
> is being delivered during another syscall.
That raised the following question to me: We have "entry" rules defined.
When we saw that we get exit codes, the conclusion was that "entry" and
"exit" are not different for every syscall. Can you confirm?
They are different in that some things are not defined at entry while all
things are defined at exit. I believe you can write all audit rules as exit
rules without any noticable differences. if have an entry rule that evaluates
to never, then it does speed things up since it no longer needs to collect
aux records. With respect to the time, its set when audit_log_start is called
which is always on exit for any rules that involve syscalls (that is when the
exit code is valid).
I tried to change our rules to "exit,always" from
"entry,always", but it
didn't make a difference. Can you confirm that only one exit is traced and
do you think audit can be enhanced to trace these extra exits of syscalls
like FORK.
Yes, I think the kernel could be updated to return twice. This would need to
be sent upstream and I think 2.6.28 is the next chance.
> > Seems like a bug? Can you have a look at it?
>
> I'll check on why we don't update the time stamp during syscall
> recursion.
Thanks a lot. I guess, the expectation could be that "exit" traces have the
datation of the "exit" and not "entry".
See above about timestamp generation.
> Then there is a problem of correlation. If I have 1 rule that
expands to
> 2, then how can I do a compare of what's in memory vs what rules are on
> disk? IOW, how do I tell that someone typed:
>
> -a entry,always -F arch=b32 -S clone -S fork -S vfork
> -a entry,always -F arch=b64 -S clone -S fork -S vfork
>
> or just
>
> -a entry,always -S clone -S fork -S vfork
>
> because auditctl would make 2 from 1. This is a really tricky issue and
> if we didn't care about correlation...or about outdated tools we trust
> too much...we could do this.
That's understood. And a typical danger of user friendly abstraction is
that it causes confusion. As I said, -F was bound to "filter" in my mind.
-F means field. In this case, the filter does use the arch field to select
which syscalls become events. But we put it before the syscall so that
auditctl looks it up in the right table. It might possibly be more correct to
introduce a selector for -S, but then people won't like giving it twice.
> ausyscall x86_64 clone
> 56
>
> ausyscall i386 clone
> 120
Very good. We have initially defined a hash in Python manually with what we
encounter, but we can rather use that to create them. We specifically have
the problem of visiting a s390 site, where it will handy to have these
already in place. There is no such function in libaudit, is there?
For what?
> We have an audit parsing library. It takes this into account.
I have looked at it, and auparse_init doesn't seem to support reading from
the socket itself, does it?
You mean the netlink socket?
It could be AUSOURCE_FILE. And there there is the issue that the logs
on
disk seem to be different from what format we get on the socket.
yes it is.
In an ideal world, we would like to note that the audit socket is
readable,
hand it (or an arbitrarily truncated chunk of data) from it to libaudit,
ask it for events until it says there are not more. That would leave the
truncated line/event issue to libaudit. Is that part of the code?
libaudit should pull complete events from the kernel unless an execve has an
excessive number of arguments or large sized arguments.
> Look at the syscall record. It is always emitted with multi-line
records.
> It has an items count. Each auxiliary (path in this case) record has an
> item number. You can tell when you have everything. Single line entries
> do not have an items field. Also note that the record comprising an event
> comes out of the kernel in a backwards order.
Ah, we simply ignored the type=PATH etc. elements. But what I mean is that
of the syscall itself, the arguments seemed to be on new lines:
This is from Python code:
data = _audit_socket.recv( 32*1024 )
print data
node=Annuitka type=SYSCALL msg=audit(1218880198.814:42205): arch=c000003e
syscall=59 success=yes exit=0 a0=16cc168 a1=1464c08 a2=1588008 a3=0 items=2
ppid=3864 pid=19928 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="ls"
exe="/bin/ls" key=(null)
node=Annuitka type=EXECVE msg=audit(1218880198.814:42205): argc=2 a0="ls"
a1="--color=auto"
node=Annuitka type=CWD msg=audit(1218880198.814:42205):
cwd="/data/home/anna/comsoft/v7a1-ps2-acs/src/acs"
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=0
name="/bin/ls" inode=1651626 dev=08:12 mode=0100755 ouid=0 ogid=0
rdev=00:00
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=1 name=(null)
inode=779612 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
node=Annuitka type=EOE msg=audit(1218880198.814:42205):
Note that we get a SYSCALL with 2 items, and then in order the items - from
the socket. But inbetween we get type=EXECVE it doesn't have an item
number,
I suppose that could be fixed.
and worse the new line before 'a1=--color-auto' is real and
so is
the empty line after it. I have another example of a "gnash" call from
Konqueror with no less than 29 arguments.
That is coming from here, and I think a patch was submitted fixing it.
http://lxr.linux.no/linux+v2.6.26.2/kernel/auditsc.c#L1114
That means, in order to parse the socket, we should check argc,
right? I
think we would prefer very long lines like they are in /var/log/audit
instead, making these kinds of steps optional.
Actually I don't understand the differences in format. I assume they serve
the purpose of making things readable?
Yes. From the kernel is a structure with some text. For logging we want it all
text.
> > This is in hope that indeed continued lines always start
with a
> > non-space and type lines always start with a space. Would you consider
> > this format worthy and possible to change?
>
> Don't like changing formats as that affects test suites.
That " type=" start is a self-confusion of ours. Starting with 1.6 the
node= part was added, and some hack was still in place that removes
"node=hostname" and leaves the space there. Sorry about that.
You can remove or add the node field. Its controlled by the name_format config
item.
> > I have no idea how much it represents and existing external
interface,
> > but I can imagine you can't change it (easily). Probably the end of
> > type= must be detected by terminating empty line in case of those that
> > can be continued. But it would be very ugly to have to know the event
> > types that have this so early in the decoding process.
>
> We have a parsing library, auparse, that handles the rules of audit
> parsing. Look for auparse.h for the API.
If you confirm that can handles the parsing from the socket, as suggested
above, we may persue that path and can ignore strangeness of the format
once its handled by the library.
The audit parsing library wants to read text strings as you would find them on
disk. The kernel keeps type separate as an integer so that decisions can be
made about what the record means without having to do a text to int
conversion. So, the audit daemon does the reformatting after it decides that
it a record type that we are interested in.
> > > > 2. We don't want to poll periodically, but
rather only wake up (and
> > > > then with minimal latency) when something interesting happened. We
> > > > would want to poll a periodic check that forks are still reported,
> > > > so we would detect a loss of service from audit.
> > >
> > > You might write a audispd plugin for this.
> >
> > Did you mean for the periodic check,
>
> There is a realtime interface for the audit stream. You can write either
> a new event dispatcher or a plugin to the existing one. Seeing as you are
> more concerned with assurance, I'd just replace the current dispatcher
> with your own. I have a description of this here:
>
>
http://people.redhat.com/sgrubb/audit/audit-rt-events.txt
I saw that too, but I thought it would be better to build upon the existing
effort. I think that's a viable alternative and potentially could be more
robust to us. At least it could be that audisp seems to try and solve
problems we don't have or want.
You have 4 points to get the audit stream, in order of distance from the event
generation: the audit netlink socket, auditd realtime interface, audisp
plugin interface, and the af_unix socket created by the af_unix plugin from
audispd. For higher reliability where you don't want of need any other audit
processing interfering, I would say use either of the first 2.
Looking at the source I saw that node name is something that audisp
indeed
adds the node name and that auditd doesn't log EOEs, explaining some of the
differences. I didn't find why audisp has extra new lines, or if auditd
removed these.
Auditd strips these before writing to disk. The realtime interface sends the
event just as it was received by the kernel.
> > or for the whole job, that means our supervision process?
>
> The supervision process. Then again, maybe you want to replace the audit
> daemon and handle events your own way. libaudit has all the primitives
> for that. So, I guess that brings up the question of how you are
> accessing the audit event stream. Are you reading straight from netlink
> or the disk?
From the files is out of question. We thought of the audit sub deamon as
something that simply allows to access the audit stream live, but that it
is otherwise the same as the file.
The data can be either binary or string. Binary means that its exactly the
same format that comes from thekernel unchanged. String means that its been
formatted just like you would see on disk. But it appears that its not
stripping the 0X0A out of the text. It probably should.
Like auditd would accept things from the kernel, write it to a file
and hand
it to audisp as well which would then provide it to others. Isn't that the
design?
Yes.
-Steve