All:
I've seen the following situation occur on 2 machines now for a total of 3 incidents:
* Audisp-remote runs normally on 5 separate servers, the problem happens on two
that are configured the same as the other 3.
* Audisp-remote runs normally on the problem servers for days to weeks at a time
without problems.
* For an unidentified reason (nothing that I can find in any system log)
audisp-remote stops sending messages to the central log server.
* Some hours or days later (depending on audit event activity) audisp-remote
consumes all system memory and swap space. In my case because of the nature of my
directory tree watches for my web content this usually happens when the web content is
being regenerated from scratch by our build server. The memory consumption happens very
rapidly.
* One server is configured with 8GB of ram and 2GB of swap, the second server has
12GB of ram and 2GB of swap.
* The system becomes completely unresponsive until enough time goes by for some
critical need for memory to arise and the OOM Killer kicks in and starts reaping enough
tasks to allow me to get in and shutdown auditd.
* At this point the system returns to normal, and if I restart auditd it resumes
normal operation.
Here is a ps aux taken when it happened today on the 12GB machine:
USER PID %CPU %MEM VSZ RSS T TY STAT
START TIME COMMAND
root 1106 0.0 0.0 0 0
? S< Oct12 0:36 [kauditd]
root 4768 0.1 0.0 92880 500 ?
S<sl Oct17 26:22 auditd
root 4770 0.2 0.1 212984 12984 ?
S<sl Oct17 31:49 /sbin/audispd
root 4771 0.0 96.7 28631936 11899072 ? S<
Oct17 7:52 /sbin/audisp-remote
Priorities for each audit task are:
Auditd -4
Audispd -14
Audisp-remote -4
All machines are fully current on maintenance. Running RedHat EL 5.5 x86_64 with the
following audit package set:
* audit-libs-python-1.7.17-3.el5
* audit-libs-1.7.17-3.el5
* audit-libs-1.7.17-3.el5
* audit-1.7.17-3.el5
* audispd-plugins-1.7.17-3.el5
All that being said, I have the following questions:
* Has anyone seen this, and if so what workarounds, or fixes are available.
* What additional data should I collect that may assist in identifying the root
cause of the problem? Since it can take days for this to manifest itself it seems like
traces are out of the question, but perhaps there are other collection tools that can be
used.
* Are there any program options or configuration options that can be used to debug
this? The man pages seem to be a bit stale in this distribution?
* Does anyone have any other ideas on what I might do to get to the bottom of
this?
I also have a separate issue, that I'm curious about. Under RedHat EL 5.5 there
doesn't seem to be any limitations on the support for audisp-remote, but I just
noticed in the release notes for RedHat EL 6 Beta, this component is flagged as a
Technology Preview in EL 6. Does anyone know the reason for the change in status? I was
planning to use this as part of my PCI-DSS compliance efforts next year but this change
may make that choice problematic.
Attached please find my current auditd.conf, audispd.conf, audisp-remote.conf and
au-remote.conf files.
Beyond this query I also plan to open a support incident with RedHat, but I thought that
by using feedback from this group I might be in a better position to provide support with
useful information to aid in problem diagnosis.
Please let me know anything else that may help to get to the bottom of this.
Thanks in advance!
Jim