On Mon, 2021-01-18 at 09:31 -0500, Steve Grubb wrote:
On Monday, January 18, 2021 8:54:30 AM EST Paul Moore wrote:
I like the N of M concept but there would be a LOT of change -
especially
for all the non-kernel event sources. The EOE would be the most
seamless, but at a cost. My preference is to allow the 2 second 'timer'
to be configurable.

Agree with Burn, numbering the records coming up from the kernel is
going to be a real nightmare, and not something to consider lightly.
Especially when it sounds like we don't yet have a root cause for the
issue.

A very long time ago, we had numbered records. But it was decided that
there's no real point in it and we'd rather just save disk space.

With the current kernel code, adding numbered records is not something to
take lightly.

That's why I'm saying we had it and it was removed. I could imagine that if 
you had auditing of the kill syscall enabled and a whole process group was 
being killed, you could have hundreds of records that need numbering. No good 
way to know in advance how many records make up the event.

I know that the kernel does not serialize the events headed for user
space. But I'm curious how an event gets stuck and others can jump ahead
while one that's already inflight can get hung for 4 seconds before it's
next record goes out?

Have you determined that the problem is the kernel? 

I assume so because the kernel adds the timestamp and choses what hits the 
socket next. Auditd does no ordering of events. It just looks up the text 
event ID, some minor translation if the enriched format is being used, and 
writes it to disk. It can handle well over 100k records per second.

Initially it was looking like it was a userspace issue, is that no longer
the general thought?

I don't see how user space could cause this. Even if auditd was slow, it 
shouldn't take 4 seconds to write to disk and then come back to read another 
record. And even it did, why would the newest record go out before completing 
one that's in progress? Something in the kernel chooses what's next. I 
suspect that might need looking at.

Also, is there a reliable reproducer yet?

I don't know of one. But, I suppose we could modify ausearch to look for 
examples of this.

Happy to run this where I can. I have also added the auditd.conf and audit.rules files to my github issue (https://github.com/linux-audit/audit-userspace/issues/148) that makes this activity more likely to occur if that helps.

Also, to meet the issue of existing ausearch and the auparse library failing to process audit.log files with such issues, are we happy for a configuration item in auditd.conf?


-Steve