On 2021-01-19 14:11, Paul Moore wrote:
On Tue, Jan 19, 2021 at 3:18 AM Burn Alting
<burn.alting(a)iinet.net.au> wrote:
> I tend to have a rigorous auditing posture (see the rules loaded in
https://github.com/linux-audit/audit-userspace/issues/148) which is not normal for most.
Perhaps, Paul, you have hit the nail on the head by stating that this 'severe
delay' is not that unreasonable given my rules posture and we just need to 'deal
with it' in user space.
> We still get the event data, I just need to adjust the user space tools to deal with
this occurrence.
> As for what the system is doing, in my home case it's a Centos 7 VM running a
tomcat service which only gets busy every 20 minutes and the other is a HPE Z800 running
Centos 8 with 4-5 VM's mostly dormant. I can put any code in these hosts to assist in
'validating'/testing the delay. Advise and I will run.
I took a (slightly) closer look at the queuing code just now and the
corner case I suspected doesn't look very promising here. On failure
to send a record to userspace, the record is put back at the front of
the queue to preserve ordering so the kernel *should* still emit
records in order even when auditd is under pressure, dead, or in the
process of a restart. Also, in this case the kernel kicks the auditd
connection, disconnecting auditd's netlink socket; I imagine that
would be a notable event on your systems.
However, looking at the timestamps in the audit events you posted, I
noticed something a little odd. If I align the timestamps a bit
better let's see if it jumps out at you guys ...
type=XXX msg=audit(1609519900.159:44606): ...
type=XXX msg=audit(1609519900.161:44607): ...
type=XXX msg=audit(1609519900.163:44608): ...
type=XXX msg=audit(1609519896.829:44609): ...
type=XXX msg=audit(1609519900.170:44610): ...
Let me guess Burn, you're running NTP/PTP :) I'm not sure how auditd
handles things like this, but it looks like there was a small negative
time correction between events which caused the odd scenario where
event N+1 actually occurred before event N according to the wall
clock. In other words, your system is time traveling ;)
Interesting...
The timestamp is assigned on syscall entry.
The serial number is assigned on the creation of the first audit record of an event.
From these timings above, NTP/PTP could explain this, but the third
and
fifth are too close together to make that likely, I think.
I'd blame time namespaces, but the host system is all in the same time
namespace.
I think I'd need to see a larger sequence to be convinced of this...
paul moore
- RGB
--
Richard Guy Briggs <rgb(a)redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635