Hello All,
I have some systems that have just been updated to from RHEL 5.2 to RHEL
5.3. The version of auditd is 1.7.7 and SnareLinux is 1.5.0.
Some time after the update ran, I noticed that the amount of free memory
on the systems had dramatically gone down. Running top, I saw that
auditd had sucked up lots of memory:
top - 09:25:59 up 2 days, 2 min, 1 user, load average: 0.00, 0.03,
0.00
Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 4043368k total, 3994488k used, 48880k free, 64656k buffers
Swap: 8385920k total, 352k used, 8385568k free, 449884k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31136 root 12 -3 6568m 3.1g 568 S 0.0 81.0 0:08.84 auditd
3861 root 15 0 26152 16m 364 S 0.0 0.4 0:00.01 restorecond
29474 root 16 0 97.9m 14m 2852 S 0.0 0.4 0:00.16 sshd
29382 root 17 0 124m 13m 1464 S 0.0 0.3 0:00.09 crond
29478 gsmith 15 0 97.9m 12m 976 S 0.0 0.3 0:00.01 sshd
5611 root 15 0 58912 9048 4208 S 0.0 0.2 3:50.38 dsmc
3937 root 15 0 96556 7588 916 S 0.0 0.2 3:54.00 sec.pl
When I first started looking at the problem, I figured the easiest fix
was reboot since a new version of auditd was installed along with a new
kernel and there was a mismatch between the auditd daemon and the
running kernel. After rebooting, I watched a 'top' display and did not
see the auditd daemon percolate up. It turns out that it does percolate
up to the top of the memory usage; I didn't watch it long enough.
I wondered if there was a difference between 32 bit and 64 bit systems.
As it turns out, there isn't one; it's seems to be slower to happen on
32 bit systems probably due to slower CPU's
Next, I wondered if this is due to the version of Snare running on the
systems. Just to be sure, I rebuilt Snare for the 64 bit system,
de-installed Snare, installed the newly rebuilt Snare, and started
auditd. Again, the auditd daemon started to percolate to the top of the
memory usage.
So, does this have anything to do with the number of rules that auditd
has to deal with? I started trimming back the rules one at a time,
restarting auditd and watching a top display. The number of rules
doesn't have any affect until you get to zero rules and then auditd
doesn't move up.
Next, I ran a valgrind on auditd; one on a RHEL 5.3 system with no rules
and another one with the standard set of rules put on our systems.
criticality=0 event=execve uid=*,root
criticality=1 event=(mount,umount,umount2,settimeofday,swapon,swapoff,
reboot,setdomainname,create_module,delete_module,quotactl)
uid=*,root
criticality=2 event=execve exe=*passwd*
criticality=3 event=(login_auth,login_start,logout)
criticality=4 event=acct_mgmt
After 2 hours of running the former had 0 malloc/free problems with the
leak summary. The latter on the other hand:
==30021==
==30021== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 1)
==30021== malloc/free: in use at exit: 147,024,501 bytes in 16,262
blocks.
==30021== malloc/free: 27,612 allocs, 11,350 frees, 249,669,818 bytes
allocated.
==30021== For counts of detected errors, rerun with: -v ==30021==
searching for pointers to 16,262 not-freed blocks.
==30021== checked 10,651,128 bytes.
==30021==
==30021== LEAK SUMMARY:
==30021== definitely lost: 146,976,704 bytes in 16,246 blocks.
==30021== possibly lost: 36,464 bytes in 5 blocks.
==30021== still reachable: 11,333 bytes in 11 blocks.
==30021== suppressed: 0 bytes in 0 blocks.
Next, I ran a strace of auditd to see if I could spot where it was doing
all the malloc's. Starting up a strace of fresh instance of auditd,
eventually auditd settled into this loop:
clock_gettime(CLOCK_MONOTONIC, {432334, 891462998}) = 0
epoll_wait(0, {{EPOLLIN, {u32=3, u64=3}}}, 64, 59743) = 1
clock_gettime(CLOCK_MONOTONIC, {432334, 891681998}) = 0
recvfrom(3, "&\0\0\0\33\5\0\0\0\0\0\0\0\0\0\0audit(1233595754"..., 8988,
MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 54
writev(5, [{"\0\0\0\0\20\0\0\0\33\5\0\0&\0\0\0", 16},
{"audit(1233595754.282:836040): c"..., 38}], 2) = 54
And so it goes. Nothing leaps out at me from this tight code sequence
that it's glomming memory.
I wondered if this odd behavior was exhibited back in RHEL 5.2 and we
just never noticed. So, I installed RHEL 5.2 in a virtual machine,
installed all the patches, and installed Snare. After several hours of
going at it, auditd has not moved up the chart at all.
So, it appears to be an auditd/Snare interaction on RHEL 5.3. I have a
RHEL 5.3 system on which I'm experimenting with auditd, rulesets and
prelude (no Snare). The ruleset has a lot of rules in it by comparison
(> 200) and everybody is well-behaved.
Any thoughts on what may be happening?
Best Regards,
Gary Smith