Hi Paul,

I have done some tests on my system, usecases include the abnormal scenario,
our CI usecases, and audit-testsuite usecases, everything is OK.
Thanks for your work.

Gaosheng.
在 2022/1/21 0:50, Paul Moore 写道:
On Thu, Jan 20, 2022 at 11:47 AM Paul Moore <paul@paul-moore.com> wrote:
When an admin enables audit at early boot via the "audit=1" kernel
command line the audit queue behavior is slightly different; the
audit subsystem goes to greater lengths to avoid dropping records,
which unfortunately can result in problems when the audit daemon is
forcibly stopped for an extended period of time.

This patch makes a number of changes designed to improve the audit
queuing behavior so that leaving the audit daemon in a stopped state
for an extended period does not cause a significant impact to the
system.

- kauditd_send_queue() is now limited to looping through the
  passed queue only once per call.  This not only prevents the
  function from looping indefinitely when records are returned
  to the current queue, it also allows any recovery handling in
  kauditd_thread() to take place when kauditd_send_queue()
  returns.

- Transient netlink send errors seen as -EAGAIN now cause the
  record to be returned to the retry queue instead of going to
  the hold queue.  The intention of the hold queue is to store,
  perhaps for an extended period of time, the events which led
  up to the audit daemon going offline.  The retry queue remains
  a temporary queue intended to protect against transient issues
  between the kernel and the audit daemon.

- The retry queue is now limited by the audit_backlog_limit
  setting, the same as the other queues.  This allows admins
  to bound the size of all of the audit queues on the system.

- kauditd_rehold_skb() now returns records to the end of the
  hold queue to ensure ordering is preserved in the face of
  recent changes to kauditd_send_queue().

Cc: stable@vger.kernel.org
Fixes: 5b52330bbfe63 ("audit: fix auditd/kernel connection state tracking")
Fixes: f4b3ee3c85551 ("audit: improve robustness of the audit queue handling")
Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

--
v2:
- incorporated feedback from Gaosheng Cui
- promoted to proper patch
v1:
- initial RFC
---
 kernel/audit.c |   62 +++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 43 insertions(+), 19 deletions(-)
Hi Gaosheng Cui,

Everything tests okay on my system, but if you have the ability to
test this patch in your environment to verify that it fixes the
problem you are seeing, it would be greatly appreciated.

Thanks.

--
paul moore
paul-moore.com
.