Hello,
I've been looking into auditd's performance. The first thing I did was to
measure the rate at which it could log things with various settings. To do
this test, I had 2 windows open. One to start auditd from the command line
without systemd interference and one to run a script as follows:
auditctl -D
auditctl -b 16440
auditctl -f 0
auditctl --backlog_wait_time 100
auditctl -a always,exit -F arch=x86_64 -S all
sleep 3
service auditd stop
auditctl -D
The results of various settings are as follows:
FLUSH FREQ Events/sec
------------------------------------------------------
SYNC 45
DATA 105
INCREMENTAL 20 400
50 1000
100 1815
200 3080
400 5800
1000 10100
2000 15275
4000 18650
8000 24075
NONE 38300
In looking further, I found that there was a lot of lock contention and
scheduling issues because of pthreads. I mapped out the paths in the code to
get a picture of where events come from and where they go:
http://people.readhat.com/sgrubb/audit/auditd-data-flow.pdf
The blue boxes are where events come from, the red boxes are where we have
contention. The gray is the path on the logging thread. The white boxes are
the main thread.
What I found is that if I make enqueue_event call write_to_log directly, it
doubles the throughput of the audit daemon. IOW, going from multi-threaded to
singly threaded makes a huge difference. The audit daemon was multi-threaded
from the very first public release back in 2004 before I started working on it.
So, what I think I am going to do is fix it to be singly threaded, fix the
signal handlers to set a variable on error so that the main thread picks it up
to serialize it with other events, move size check and rotate code, and remove
the pthreads code.
That leaves an issue with dispatching events to other programs. What I have
been thinking about is perhaps using libevfibers to manage switching between
logging and dispatching.
One other tidbit that I found out during testing, if I generate so many events
that it overflows the kernel queue, the default settings for backlog_wait_time
makes the system unusable. It acts like its live-locked. So, I would recommend
that the default setting in the kernel be changed to something more livable
and anyone concerned about this to explicitly set the value to something low.
-Steve