Hello All,
I looked at the source code for auditd and came up with the following
fix for the behavior I described.
You may have a better fix for the issue.
When the setting for the output log format is set to "NOLOG" (log_format
= NOLOG in auditd.conf) it appears that audit events are getting stacked
up in the internal message queue (audit_reply_list) and are not removed
from the stack after being written to the audit dispatcher daemon. The
result is the stack grows without end.
I have the following potential fix for audit version 1.7.11:
In "auditd.c"
171c171
< if (rep->reply.type != AUDIT_EOE) {
---
if (rep->reply.type != AUDIT_EOE &&
config.log_format !=
LF_NOLOG) {
I've rebuilt the rpm and tried it on a RHEL 5.3 i386 system with
2.6.18-128.el5 kernel and all is well with auditd.
This may not be the best fix for the excessive memory glomming I've
seen.
Best regards,
Gary Smith
-----Original Message-----
From: Steve Grubb [mailto:sgrubb@redhat.com]
Sent: Thursday, February 05, 2009 8:33 AM
To: linux-audit(a)redhat.com
Cc: Lucas C. Villa Real; Smith, Gary R
Subject: Re: Problem with auditd/SnareLinux on RHEL 5.3 - auditd
glomming memory
On Wednesday 04 February 2009 09:14:03 pm Lucas C. Villa Real wrote:
2009/2/4 Smith, Gary R <gary.smith(a)pnl.gov>:
I noticed a very similar behavior when the system was under high
stress (ie: having many rules and many remote clients generating audit
events). After much debugging, it was found that the asynchronous
nature of netlink made it possible for auditd's queue to grow wildly,
until the kernel started to kill other processes due to OOM (auditd
asks the kernel not to be killed under OOM conditions, so every
process but auditd is shot).
Yes, I think auditd is blamed for the memory consumption related to it
inside
the kernel. I have run valgrind against the audit daemon many times and
I
know of no resource leaks. The only knob you really have to turn if the
kernel queue is a problem is to increase the priority of the audit
daemon so
it gets more run time.
The reason was that audit's consumer thread -- the one that runs
auditd-event.c:event_thread_main() -- was consuming events slower than
the rate in which netlink events were sent from the kernel to auditd's
main thread.
This is because of some requirements on CC evals about knowing how many
events
are in flight. The input queue is simply 1. If you have the audit event
dispatcher running, it gets first shot at handling the event. Then the
event
goes to disk. If you have synchronous logging then write blocks for a
while.
So, changing to buffered IO might be better for throughput.
The solution we found (and which is still being tested) was to
define
a high water mark on how many events to allow auditd to have in its
input queue. Given that each netlink message takes about 9kb, one can
set the high water mark to e.g: 500000 to have at most 4.5GB events in
RAM. So, when auditd reaches that high water mark, we ask the kernel
to slow down: all further events sent by the kernel have a "need an
ack" flag included so that the caller process (the one that generated
the system call that had to be audited) gets blocked until a reply is
sent from the daemon.
Originally, I thin David Woodhouse patched the kernel so that callers
were put
on a wait queue when we hit the end of the internal queue. I think
someone
removed it thinking the system becomes unresponsive.
Please let me know if that happens to be the reason of the problems
you're having. I've been working mostly with audit 1.7.4 and kernel
2.6.16.16+patches, so our changes still need to be ported to a recent
kernel and audit package before they're submitted officially (that's
likely to happen in march, after my master thesis' final deadline --
which is driving me crazy).
Note that the event model changed inside the audit daemon around 1.7.5.
Its
very different than before. In the near future, I am planning to pull
the
code from audispd into auditd and use the queue code from audispd so
that
input and output of auditd can really become multi-threaded. I'm
thinking
this would allow better dequeuing of kernel events.
-Steve