All:
I’ve seen the following situation occur on 2 machines
now for a total of 3 incidents:
·
Audisp-remote runs normally on 5 separate servers,
the problem happens on two that are configured the same as the other 3.
·
Audisp-remote runs normally on the problem
servers for days to weeks at a time without problems.
·
For an unidentified reason (nothing that I can
find in any system log) audisp-remote stops sending messages to the central log
server.
·
Some hours or days later (depending on audit
event activity) audisp-remote consumes all system memory and swap space. In my
case because of the nature of my directory tree watches for my web content this
usually happens when the web content is being regenerated from scratch by our
build server. The memory consumption happens very rapidly.
·
One server is configured with 8GB of ram and 2GB
of swap, the second server has 12GB of ram and 2GB of swap.
·
The system becomes completely unresponsive until
enough time goes by for some critical need for memory to arise and the OOM
Killer kicks in and starts reaping enough tasks to allow me to get in and
shutdown auditd.
·
At this point the system returns to normal, and
if I restart auditd it resumes normal operation.
Here is a ps aux taken when it happened today on the 12GB
machine:
USER
PID %CPU %MEM
VSZ RSS
T TY STAT START
TIME COMMAND
root
1106 0.0 0.0
0 0
?
S< Oct12
0:36 [kauditd]
root
4768 0.1 0.0
92880 500
?
S<sl Oct17 26:22 auditd
root
4770 0.2 0.1
212984 12984
? S<sl Oct17
31:49 /sbin/audispd
root
4771 0.0 96.7
28631936 11899072
? S< Oct17
7:52 /sbin/audisp-remote
Priorities for each audit task are:
Auditd -4
Audispd -14
Audisp-remote
-4
All machines are fully current on maintenance. Running
RedHat EL 5.5 x86_64 with the following audit package set:
·
audit-libs-python-1.7.17-3.el5
·
audit-libs-1.7.17-3.el5
·
audit-libs-1.7.17-3.el5
·
audit-1.7.17-3.el5
·
audispd-plugins-1.7.17-3.el5
All that being said, I have the following questions:
·
Has anyone seen this, and if so what workarounds,
or fixes are available.
·
What additional data should I collect that may
assist in identifying the root cause of the problem? Since it can take
days for this to manifest itself it seems like traces are out of the question,
but perhaps there are other collection tools that can be used.
·
Are there any program options or
configuration options that can be used to debug this? The man pages seem to be
a bit stale in this distribution?
·
Does anyone have any other ideas on what I might
do to get to the bottom of this?
I also have a separate issue, that I’m curious
about. Under RedHat EL 5.5 there doesn’t seem to be any limitations
on the support for audisp-remote, but I just noticed in the release notes for
RedHat EL 6 Beta, this component is flagged as a Technology Preview in EL 6.
Does anyone know the reason for the change in status? I was planning to use
this as part of my PCI-DSS compliance efforts next year but this change may
make that choice problematic.
Attached please find my current auditd.conf, audispd.conf,
audisp-remote.conf and au-remote.conf files.
Beyond this query I also plan to open a support incident
with RedHat, but I thought that by using feedback from this group I might
be in a better position to provide support with useful information to aid in
problem diagnosis.
Please let me know anything else that may help to get to the
bottom of this.
Thanks in advance!
Jim