Hello Steve,
[ time descending, sequence number ascending problem ]
What this indicates is that there was some recursion before the
syscall
triggered an event. The syscall context exists from sycall entry to exit.
If during the middle a signal is delivered, the syscall is not finished.
Instead it runs the signal handler associated with the signal. The signal
handler might make syscalls which are then handled using the existing
syscall context via linked list. When that occurs, the timestamp is not
being updated. Not sure that is appropriate or why the original time really
mattered. But that is what you are observing. My guess is SIGTERM is being
delivered during another syscall.
That raised the following question to me: We have "entry" rules defined. When
we saw that we get exit codes, the conclusion was that "entry" and
"exit" are
not different for every syscall. Can you confirm?
And assuming that, indeed the may fork complete only after its has completed
its signal handlers. The expectation would be that this is not an issue
though, because the new process is inactive. Wrong assumption. The new
process seemingly can already EXECVE with its fresh life, if the call of FORK
is interrupted after the process exists.
Now that poses an interesting problem. I guess, what we are missing is that
FORK should actually enter once and return up to _two_ times, and we need to
handle and see traces of these. That way we would simply see FORK return with
0 for the child before it does EXECVE, and so we would know who created it
(ppid) and know that it exists.
I tried to change our rules to "exit,always" from "entry,always", but
it
didn't make a difference. Can you confirm that only one exit is traced and do
you think audit can be enhanced to trace these extra exits of syscalls like
FORK.
For every workaround, a SIGKILL signal towards the FORK making process would
probably leave us even without a trace, forever possibly.
As for the signal at hand here, I think we have ant (gcj based Java) doing
multi-threading (parallel building). That would be FORKs series with probably
a SIGALARM or whatever they use to switch target execution without resorting
to threading.
> Seems like a bug? Can you have a look at it?
I'll check on why we don't update the time stamp during syscall recursion.
Thanks a lot. I guess, the expectation could be that "exit" traces have the
datation of the "exit" and not "entry".
Then there is a problem of correlation. If I have 1 rule that expands
to 2,
then how can I do a compare of what's in memory vs what rules are on disk?
IOW, how do I tell that someone typed:
-a entry,always -F arch=b32 -S clone -S fork -S vfork
-a entry,always -F arch=b64 -S clone -S fork -S vfork
or just
-a entry,always -S clone -S fork -S vfork
because auditctl would make 2 from 1. This is a really tricky issue and if
we didn't care about correlation...or about outdated tools we trust too
much...we could do this.
That's understood. And a typical danger of user friendly abstraction is that
it causes confusion. As I said, -F was bound to "filter" in my mind.
For "arch" it's suddenly a selector. I would find something like this more
logical:
-a entry,always,any -S clone -S fork -S vfork
and if I really only wanted a certain arch, make me say so:
-a entry,always,b64 -S clone -S fork -S vfork
ausyscall x86_64 clone
56
ausyscall i386 clone
120
Very good. We have initially defined a hash in Python manually with what we
encounter, but we can rather use that to create them. We specifically have
the problem of visiting a s390 site, where it will handy to have these
already in place. There is no such function in libaudit, is there?
We have an audit parsing library. It takes this into account.
I have looked at it, and auparse_init doesn't seem to support reading from the
socket itself, does it? It could be AUSOURCE_FILE. And there there is the
issue that the logs on disk seem to be different from what format we get on
the socket. The node= is not on disk, newlines, empty lines, etc. see below
about that.
In an ideal world, we would like to note that the audit socket is readable,
hand it (or an arbitrarily truncated chunk of data) from it to libaudit, ask
it for events until it says there are not more. That would leave the
truncated line/event issue to libaudit. Is that part of the code?
> Without a very stateful message parser, one that e.g. knows how
many
> lines are to follow an EXECVE, we don't know when to forward it the part
> that should process it.
[Example deleted]
Look at the syscall record. It is always emitted with multi-line
records.
It has an items count. Each auxiliary (path in this case) record has an
item number. You can tell when you have everything. Single line entries do
not have an items field. Also note that the record comprising an event
comes out of the kernel in a backwards order.
Ah, we simply ignored the type=PATH etc. elements. But what I mean is that of
the syscall itself, the arguments seemed to be on new lines:
This is from Python code:
data = _audit_socket.recv( 32*1024 )
print data
node=Annuitka type=SYSCALL msg=audit(1218880198.814:42205): arch=c000003e
syscall=59 success=yes exit=0 a0=16cc168 a1=1464c08 a2=1588008 a3=0 items=2
ppid=3864 pid=19928 auid=4294967295 uid=1000 gid=1000 euid=1000 suid=1000
fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts3 ses=4294967295 comm="ls"
exe="/bin/ls" key=(null)
node=Annuitka type=EXECVE msg=audit(1218880198.814:42205): argc=2 a0="ls"
a1="--color=auto"
node=Annuitka type=CWD msg=audit(1218880198.814:42205):
cwd="/data/home/anna/comsoft/v7a1-ps2-acs/src/acs"
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=0 name="/bin/ls"
inode=1651626 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
node=Annuitka type=PATH msg=audit(1218880198.814:42205): item=1 name=(null)
inode=779612 dev=08:12 mode=0100755 ouid=0 ogid=0 rdev=00:00
node=Annuitka type=EOE msg=audit(1218880198.814:42205):
Note that we get a SYSCALL with 2 items, and then in order the items - from
the socket. But inbetween we get type=EXECVE it doesn't have an item number,
and worse the new line before 'a1=--color-auto' is real and so is the empty
line after it. I have another example of a "gnash" call from Konqueror with
no less than 29 arguments.
That means, in order to parse the socket, we should check argc, right? I think
we would prefer very long lines like they are in /var/log/audit instead,
making these kinds of steps optional.
Actually I don't understand the differences in format. I assume they serve the
purpose of making things readable?
Did you know about the audit parsing library?
Our assumption was also that it should be easy enough to parse the text. Well
you know assumptions. Rarely ever true. :-)
> This is in hope that indeed continued lines always start with a
non-space
> and type lines always start with a space. Would you consider this format
> worthy and possible to change?
Don't like changing formats as that affects test suites.
That " type=" start is a self-confusion of ours. Starting with 1.6 the node=
part was added, and some hack was still in place that removes "node=hostname"
and leaves the space there. Sorry about that.
> I have no idea how much it represents and existing external
interface,
> but I can imagine you can't change it (easily). Probably the end of type=
> must be detected by terminating empty line in case of those that can be
> continued. But it would be very ugly to have to know the event types that
> have this so early in the decoding process.
We have a parsing library, auparse, that handles the rules of audit
parsing. Look for auparse.h for the API.
If you confirm that can handles the parsing from the socket, as suggested
above, we may persue that path and can ignore strangeness of the format once
its handled by the library.
> > There might be tunables that different distros can used
with glibc.
> > strace is your friend...and having both 32/64 bit rules if amd64 is the
> > target platform.
>
> We did that of course. And what was confusing us was that the audit.log
> did actually seem to show the calls. Can that even be?
Yes, as explained above.
Sorry, I am still confused. Can you explain why the socket and the audit.log
can have different contents. I was blaming my (usually bad) memory.
> I see. Luckily we are not into security, but only
"safety". I can't find
> anything on Wikipedia about it, so I will try to explain it briefly,
> please forgive my limited understanding of it. :-)
At one point, I worked on Space Shuttle software. I know a little on how
they think about this.
Well, that's perfect. :-)
> > > 2. We don't want to poll periodically, but rather
only wake up (and
> > > then with minimal latency) when something interesting happened. We
> > > would want to poll a periodic check that forks are still reported, so
> > > we would detect a loss of service from audit.
> >
> > You might write a audispd plugin for this.
>
> Did you mean for the periodic check,
There is a realtime interface for the audit stream. You can write either a
new event dispatcher or a plugin to the existing one. Seeing as you are
more concerned with assurance, I'd just replace the current dispatcher with
your own. I have a description of this here:
http://people.redhat.com/sgrubb/audit/audit-rt-events.txt
I saw that too, but I thought it would be better to build upon the existing
effort. I think that's a viable alternative and potentially could be more
robust to us. At least it could be that audisp seems to try and solve
problems we don't have or want.
Looking at the source I saw that node name is something that audisp indeed
adds the node name and that auditd doesn't log EOEs, explaining some of the
differences. I didn't find why audisp has extra new lines, or if auditd
removed these.
I think we will make a prototype for the RT interface and see what it gives
us.
> or for the whole job, that means our supervision process?
The supervision process. Then again, maybe you want to replace the audit
daemon and handle events your own way. libaudit has all the primitives for
that. So, I guess that brings up the question of how you are accessing the
audit event stream. Are you reading straight from netlink or the disk?
From the files is out of question. We thought of the audit sub deamon
as
something that simply allows to access the audit stream live, but that it is
otherwise the same as the file. Like auditd would accept things from the
kernel, write it to a file and hand it to audisp as well which would then
provide it to others. Isn't that the design?
> I was wondering why audit wouldn't use that. Is that
historic (didn't
> exist, nobody made a patch for it) or conscious decision (too difficult,
> not worth it). Just curious here and of course the comment could be read
> as a bit scary, because it actually means we will have to benchmark the
> impact...
systemtap came after audit. They have 2 different purposes. One is
debugging/profiling, the other is regulatory compliance and security. The
system tap people have no gurantees about what kinds of data is contained
in the stream or the reliability of delivery. There was some talk about
combining hooks and in the end it was decided that we should leave them
disconnected as they serve entirely different purposes.
Ah I see. Thanks for the explanation. :-)
Best regards,
Kay Hayen