Hello All,

I have some systems that have just been updated to from RHEL 5.2 to RHEL 5.3. The version of auditd is 1.7.7 and SnareLinux is 1.5.0.

Some time after the update ran, I noticed that the amount of free memory on the systems had dramatically gone down. Running top, I saw that auditd had sucked up lots of memory:

top - 09:25:59 up 2 days, 2 min,  1 user,  load average: 0.00, 0.03, 0.00

Tasks: 111 total,   1 running, 110 sleeping,   0 stopped,   0 zombie

Cpu(s):  0.1%us,  0.1%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st

Mem:   4043368k total,  3994488k used,    48880k free,    64656k buffers

Swap:  8385920k total,      352k used,  8385568k free,   449884k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           

31136 root      12  -3 6568m 3.1g  568 S  0.0 81.0   0:08.84 auditd            

 3861 root      15   0 26152  16m  364 S  0.0  0.4   0:00.01 restorecond       

29474 root      16   0 97.9m  14m 2852 S  0.0  0.4   0:00.16 sshd              

29382 root      17   0  124m  13m 1464 S  0.0  0.3   0:00.09 crond             

29478 gsmith    15   0 97.9m  12m  976 S  0.0  0.3   0:00.01 sshd              

 5611 root      15   0 58912 9048 4208 S  0.0  0.2   3:50.38 dsmc              

 3937 root      15   0 96556 7588  916 S  0.0  0.2   3:54.00 sec.pl   

When I first started looking at the problem, I figured the easiest fix was reboot since a new version of auditd was installed along with a new kernel and there was a mismatch between the auditd daemon and the running kernel. After rebooting, I watched a 'top' display and did not see the auditd daemon percolate up. It turns out that it does percolate up to the top of the memory usage; I didn't watch it long enough.

I wondered if there was a difference between 32 bit and 64 bit systems. As it turns out, there isn't one; it's seems to be slower to happen on 32 bit systems probably due to slower CPU's

Next, I wondered if this is due to the version of Snare running on the systems. Just to be sure, I rebuilt Snare for the 64 bit system, de-installed Snare, installed the newly rebuilt Snare, and started auditd. Again, the auditd daemon started to percolate to the top of the memory usage.

So, does this have anything to do with the number of rules that auditd has to deal with? I started trimming back the rules one at a time, restarting auditd and watching a top display. The number of rules doesn't have any affect until you get to zero rules and then auditd doesn't move up.

Next, I ran a valgrind on auditd; one on a RHEL 5.3 system with no rules and another one with the standard set of rules put on our systems.

criticality=0   event=execve    uid=*,root

criticality=1   event=(mount,umount,umount2,settimeofday,swapon,swapoff, reboot,setdomainname,create_module,delete_module,quotactl)      uid=*,root

criticality=2   event=execve    exe=*passwd*

criticality=3   event=(login_auth,login_start,logout)

criticality=4   event=acct_mgmt

After 2 hours of running the former had 0 malloc/free problems with the leak summary. The latter on the other hand:

==30021==

==30021== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 4 from 1) ==30021== malloc/free: in use at exit: 147,024,501 bytes in 16,262 blocks.

==30021== malloc/free: 27,612 allocs, 11,350 frees, 249,669,818 bytes allocated.

==30021== For counts of detected errors, rerun with: -v ==30021== searching for pointers to 16,262 not-freed blocks.

==30021== checked 10,651,128 bytes.

==30021==

==30021== LEAK SUMMARY:

==30021==    definitely lost: 146,976,704 bytes in 16,246 blocks.

==30021==      possibly lost: 36,464 bytes in 5 blocks.

==30021==    still reachable: 11,333 bytes in 11 blocks.

==30021==         suppressed: 0 bytes in 0 blocks.

Next, I ran a strace of auditd to see if I could spot where it was doing all the malloc's. Starting up a strace of fresh instance of auditd, eventually auditd settled into this loop:

clock_gettime(CLOCK_MONOTONIC, {432334, 891462998}) = 0

epoll_wait(0, {{EPOLLIN, {u32=3, u64=3}}}, 64, 59743) = 1

clock_gettime(CLOCK_MONOTONIC, {432334, 891681998}) = 0

recvfrom(3, "&\0\0\0\33\5\0\0\0\0\0\0\0\0\0\0audit(1233595754"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 54

writev(5, [{"\0\0\0\0\20\0\0\0\33\5\0\0&\0\0\0", 16}, {"audit(1233595754.282:836040):  c"..., 38}], 2) = 54

And so it goes. Nothing leaps out at me from this tight code sequence that it's glomming memory.

I wondered if this odd behavior was exhibited back in RHEL 5.2 and we just never noticed. So, I installed RHEL 5.2 in a virtual machine, installed all the patches, and installed Snare. After several hours of going at it, auditd has not moved up the chart at all.

So, it appears to be an auditd/Snare interaction on RHEL 5.3. I have a RHEL 5.3 system on which I'm experimenting with auditd, rulesets and prelude (no Snare). The ruleset has a lot of rules in it by comparison (> 200) and everybody is well-behaved.

Any thoughts on what may be happening?

Best Regards,

Gary Smith