Hi Everyone,
Had a couple of concerns that I wanted to discuss:
1.
I was getting a few "auditd queue full" messages in syslog. I had
previously faced similar issues after which I had increased the q_depth and
modified my ruleset to reduce the number of events logged which had brought
down these errors significantly.
However, once I started receiving the same error, I examined the auditd
logs using aureport and ausearch and to my surprise there were hardly any
events during the given time period. To debug this, when I generated the
queue statistics, the numbers I saw seem to indicate very strongly that
there is a bug somewhere in the code.
This seemed to be the case on multiple machines.
Output of /var/run/auditd.state:
sudo cat /run/auditd.state
current time = 03/02/22 18:30:47 process priority = -4
writing to logs = no Number of active plugins = 1 current plugin queue
depth = 4294967240 max plugin queue depth used = 4294967295 plugin queue
size = 25000 plugin queue overflow detected = yes plugin queueing suspended
= no listening for network connections = no
I am not sure but the only way I can think that max plugin queue depth used
can be 4294967295 (despite the maxlimit being set to 25000) is if we
dequeue an event before it has been enqueued. Also, the current plugin
queue depth suggests that events are being dequeued continuously leading to
the value decreasing from 4294967295 to 4294967240?
Not really sure what is going on here but my guess was the queue elements
were not made NULL and the queue variables were reset?
2.
Another update that I would like to make is currently, if we reload the
auditd configuration instead of restarting, although the configuration
changes, we do not reset some of the queue statistic variables which I feel
is incorrect.
https://github.com/linux-audit/audit-userspace/blob/770e4f538103f8a055f46...
Ex- If q_depth=400 and the queue overflows, the overflowed variable is set
to 1. On changing the q_depth value to say 10000 and doing a reload, the
queue size has changed and basically so has the queue. I feel here we
should reset some of the queue statistic variables like overflowed as it is
incorrect to say that in it's current form the queue has overflown. This
variable is not reset and I feel that it should be.
If agreed that this is a reasonable change, would it be ok if I submit a PR
for the same?
Also, is it possible that point 2 is causing issues leading to point 1
errors?
3. Would also like to improve the manpage documentation related to
/var/run/auditd.state. Currently it states that it is a dump of the internal
state. I would like to change that to provide a little more detail about
what the internal state contains - such as queue statistics, priority etc.
Apart from that I feel that we can also add an additonal field to the
auditd.state file as to when the queue has overflown which may make it
easier to perform ausearch related queries with start time and end time.
If any of the changes are worth contributing to I would be happy to make
the said changes.
But yeah, I guess the priority right now should be point1 and we can think
of the others post that.