I would like to suggest providing a mechanism where admins can query the status or state
of backlog issues (wait time, sums, etc...). Maybe the intent is to expand the output of
status checking of auditd.
I believe further clarity is beneficial on the setting of the 'backlog_wait_sum'
(or to whatever the name evolves to) initially.- How it evolves over time- What the
conditions in the system, or auditing, would change it- What conditions admins should pay
attention to for informational understanding of status
- What conditions admins should realize exist such that adjustments are needed (and
suggestions to what those adjustments should be)- What new guidance will admins have for
building adjusting audit.rules around this
Consider the scenario where auditing has been 'working fine' for days.Little to no
active admin monitoring.Events occur to spike the auditing such that backloging of audit
records dramatically increases.(for some reason) admins now come looking to
investigate.Assuming they do: 'systemctl status auditd' the newly presented
'state' of the 'backlog_wait_sum' will show some evidence.Q: Is that just
a moment in time?Q: What information here will give the perspective things are good/ok
'now', versus some action needs to be taken?
Maybe that isn't a great scenario, or good questions----it is what occurs to me at the
moment.
Thank you.
R,-Joe Wulf
On Wednesday, July 1, 2020, 5:33:14 PM EDT, Max Englander
<max.englander(a)gmail.com> wrote:
In environments where the preservation of audit events and
predictable
usage of system memory are prioritized, admins may use a combination of>
--backlog_wait_time and -b options at the risk of degraded performance> resulting from
backlog waiting. In some cases, this risk may be> preferred to lost events or
unbounded memory usage. Ideally, this risk> can be mitigated by making adjustments
when backlog waiting is detected.> > However, detection can be diffult using the
currently available metrics.> For example, an admin attempting to debug degraded
performance may> falsely believe a full backlog indicates backlog waiting. It may
turn> out the backlog frequently fills up but drains quickly.> > To make it
easier to reliably track degraded performance to backlog> waiting, this patch makes
the following changes:> > Add a new field backlog_wait_sum to the audit status
reply. Initialize> this field to zero. Add to this field the total time spent by
the> current task on scheduled timeouts while the backlog limit is exceeded.>
> Tested on Ubuntu 18.04 using complementary changes to the audit> userspace:
https://github.com/linux-audit/audit-userspace/pull/134.
<snip>