Hi guys,
I'm facing a situation where -ENOBUFS is returned from both
audit_send() and audit_get_reply(). The system is under high stress,
with 250k files being created and having creat() and chmod() syscalls
audited.
Looking the code at lib/netlink.c, I saw that audit_send() doesn't
handle -ENOBUFS. Would it be possible to replace the condition from
"while (retval < 0 && errno == EINTR)" to "while (retval < 0
&& (errno
== EINTR || errno == ENOBUFS))" to fix the problem when sending
packets from userspace to kernel?
My understanding for the problem in audit_get_reply() is that the I/O
buffers are all full and auditd was just not scheduled at the expected
rate, causing these buffers to overflow. Does that make sense? If it
does, do you have a suggestion about the best way to approach this
problem, besides changing auditd's priority? I thought of a dirty
trick such as forcing auditd to be rescheduled, but that would be way
too intrusive.
One interesting thing which I noticed is that 'auditctl -s' doesn't
report that messages were lost, although a few events did not appear
in the logs. I'm still not sure if they didn't appear because of this
specific problem, but given that ENOBUFS was returned I would expect
to see a positive counter in "lost" below:
AUDIT_STATUS: enabled=1 flag=1 pid=3821 rate_limit=0
backlog_limit=8192 lost=0 backlog=0
This is happening with an old kernel, 2.6.16.46 + a bunch of patches,
and audit 1.7.4. I cannot completely upgrade it to a new release, but
I can certainly backport audit specific bits if you remember having
fixed something similar since then.
Thanks,
Lucas