I would like to suggest providing a mechanism where admins can query the status or state of backlog issues (wait time, sums, etc...).  Maybe the intent is to expand the output of status checking of auditd.

I believe further clarity is beneficial on the setting of the 'backlog_wait_sum' (or to whatever the name evolves to) initially.
-  How it evolves over time
-  What the conditions in the system, or auditing, would change it
-  What conditions admins should pay attention to for informational understanding of status
-  What conditions admins should realize exist such that adjustments are needed
   (and suggestions to what those adjustments should be)
-  What new guidance will admins have for building adjusting audit.rules around this

Consider the scenario where auditing has been 'working fine' for days.
Little to no active admin monitoring.
Events occur to spike the auditing such that backloging of audit records dramatically increases.
(for some reason) admins now come looking to investigate.
Assuming they do:  'systemctl status auditd' the newly presented 'state' of the 'backlog_wait_sum' will show some evidence.
Q:  Is that just a moment in time?
Q:  What information here will give the perspective things are good/ok 'now', versus some action needs to be taken?

Maybe that isn't a great scenario, or good questions----it is what occurs to me at the moment.

Thank you.

R,
-Joe Wulf


On Wednesday, July 1, 2020, 5:33:14 PM EDT, Max Englander <max.englander@gmail.com> wrote:

>  In environments where the preservation of audit events and predictable
usage of system memory are prioritized, admins may use a combination of
--backlog_wait_time and -b options at the risk of degraded performance
resulting from backlog waiting. In some cases, this risk may be
preferred to lost events or unbounded memory usage. Ideally, this risk
can be mitigated by making adjustments when backlog waiting is detected.
However, detection can be diffult using the currently available metrics.
For example, an admin attempting to debug degraded performance may
falsely believe a full backlog indicates backlog waiting. It may turn
out the backlog frequently fills up but drains quickly.
To make it easier to reliably track degraded performance to backlog
waiting, this patch makes the following changes:
Add a new field backlog_wait_sum to the audit status reply. Initialize
this field to zero. Add to this field the total time spent by the
current task on scheduled timeouts while the backlog limit is exceeded.
Tested on Ubuntu 18.04 using complementary changes to the audit
userspace: https://github.com/linux-audit/audit-userspace/pull/134.

<snip>