Hi Everyone,
Had a couple of concerns that I wanted to discuss:
1.
I was getting a few "auditd queue full" messages in syslog. I had previously faced similar issues after which I had increased the q_depth and modified my ruleset to reduce the number of events logged which had brought down these errors significantly.
However, once I started receiving the same error, I examined the auditd logs using aureport and ausearch and to my surprise there were hardly any events during the given time period. To debug this, when I generated the queue statistics, the numbers I saw seem to indicate very strongly that there is a bug somewhere in the code.
This seemed to be the case on multiple machines.
Output of /var/run/auditd.state:
sudo cat /run/auditd.state
current time = 03/02/22 18:30:47
process priority = -4
writing to logs = no
Number of active plugins = 1
current plugin queue depth = 4294967240
max plugin queue depth used = 4294967295
plugin queue size = 25000
plugin queue overflow detected = yes
plugin queueing suspended = no
listening for network connections = no
I am not sure but the only way I can think that max plugin queue depth used can be 4294967295 (despite the maxlimit being set to 25000) is if we dequeue an event before it has been enqueued. Also, the current plugin queue depth suggests that events are being dequeued continuously leading to the value decreasing from 4294967295 to 4294967240?
Not really sure what is going on here but my guess was the queue elements were not made NULL and the queue variables were reset?
2.
Another update that I would like to make is currently, if we reload the auditd configuration instead of restarting, although the configuration changes, we do not reset some of the queue statistic variables which I feel is incorrect.
Ex- If q_depth=400 and the queue overflows, the overflowed variable is set to 1. On changing the q_depth value to say 10000 and doing a reload, the queue size has changed and basically so has the queue. I feel here we should reset some of the queue statistic variables like overflowed as it is incorrect to say that in it's current form the queue has overflown. This variable is not reset and I feel that it should be.
If agreed that this is a reasonable change, would it be ok if I submit a PR for the same?
Also, is it possible that point 2 is causing issues leading to point 1 errors?
3. Would also like to improve the manpage documentation related to /var/run/auditd.state. Currently it states that it is a dump of the internal state. I would like to change that to provide a little more detail about what the internal state contains - such as queue statistics, priority etc.
Apart from that I feel that we can also add an additonal field to the auditd.state file as to when the queue has overflown which may make it easier to perform ausearch related queries with start time and end time.
If any of the changes are worth contributing to I would be happy to make the said changes.
But yeah, I guess the priority right now should be point1 and we can think of the others post that.