-----邮件原件-----
发件人: Paul Moore [mailto:paul@paul-moore.com]
发送时间: 2019年9月18日 20:23
收件人: Li,Rongqing <lirongqing(a)baidu.com>
抄送: Eric Paris <eparis(a)redhat.com>; linux-audit(a)redhat.com
主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
On Tue, Sep 17, 2019 at 9:07 PM Li,Rongqing <lirongqing(a)baidu.com> wrote:
> > -----邮件原件-----
> > 发件人: Paul Moore [mailto:paul@paul-moore.com]
> > 发送时间: 2019年9月18日 3:17
> > 收件人: Li,Rongqing <lirongqing(a)baidu.com>
> > 抄送: Eric Paris <eparis(a)redhat.com>; linux-audit(a)redhat.com
> > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
> >
> > On Mon, Sep 16, 2019 at 9:08 PM Li,Rongqing <lirongqing(a)baidu.com>
wrote:
> > > > -----邮件原件-----
> > > > 发件人: Paul Moore [mailto:paul@paul-moore.com]
> > > > 发送时间: 2019年9月17日 6:52
> > > > 收件人: Li,Rongqing <lirongqing(a)baidu.com>
> > > > 抄送: Eric Paris <eparis(a)redhat.com>; linux-audit(a)redhat.com
> > > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit
> > > > failed
...
> > > I just want to it as before 3197542482df ("audit: rework
> > > audit_log_start()"), wait 60 seconds once if
> > > auditd/readaheaad-collector have some problem to drain the audit
backlog.
> >
> > The patch you mention fixed what was deemed to be buggy behavior; as
> > mentioned previously in this thread I see no good reason to go back
> > to the old behavior.
> >
> > > > If you are not using audit, you can always disable it via the
> > > > kernel command line, or at runtime (look at what Fedora does).
> > > >
> > > > > > You might also want to investigate what is generating some
> > > > > > many audit records prior to starting the audit daemon.
> > > > >
> > > > > It is /sbin/readahead-collector, in fact, we stop the auditd;
> > > > > We are doing a
> > > > reboot test, which rebooting machine continue to test
hardware/software.
> > > > >
> > > > > it is same as below:
> > > > > auditctl -a always,exit -S all -F pid='xxx'
> > > > > kill -s 19 `pidof auditd`
> > > > >
> > > > > then the audited task will be hung
> > > >
> > > > So you are seeing this problem only when you run a test, or did
> > > > you provide this as a reproducer?
> > >
> > > auditctl -a always,exit -S all -F ppid=`pidof sshd` kill -s 19
> > > `pidof auditd` ssh root(a)127.0.0.1
> > >
> > > then ssh will be hung forever
> >
> > That is expected behavior. You are putting a massive audit load on
> > the system by telling the kernel to audit every syscall that sshd
> > makes, then you are intentionally killing the audit daemon and attempting
to ssh into the system.
> > The proper fix(es) here would be to 1) set reasonable audit rules
> > and/or 2) use an init system that monitors and restarts auditd when
> > it fails (systemd has this capability, I believe some others do as well).
>
> Both are not working.
> The auditd is not dead, it is in stop status(kill -s 19). So systemd/init will not
restart it.
> Even if with little audit rules, after multiple accesses, the backlog
> will full due to no receiver
Fair point, however I still stand by my previous comments that there are
runtime configuration knobs which can mitigate this problem if it is something
you are concerned about. Depending on the situation, you can either increase
the backlog to deal with transient problems, or decrease the backlog wait time
(possibly to zero) to prevent blocking entirely.
No need knobs, auditctl can change the backlog length and wait time. And it is helpless to
change the backlog length if auditd is hung forever, as a task can be hung forever due to
disk/filesystem's abnormal, etc
I am saying the audit default behaviors which is changed, I truly meet the issue as
description of the below commit, if we can make change, other can avoid this issue.
commit ac4cec443a80bfde829516e7a7db10f7325aa528
Author: David Woodhouse <dwmw2(a)shinybook.infradead.org>
Date: Sat Jul 2 14:08:48 2005 +0100
AUDIT: Stop waiting for backlog after audit_panic() happens
We force a rate-limit on auditable events by making them wait for space
on the backlog queue. However, if auditd really is AWOL then this could
potentially bring the entire system to a halt, depending on the audit
rules in effect.
Other method to avoid this issue to make audit_backlog_wait_time as 0 by default
diff --git a/kernel/audit.c b/kernel/audit.c
index da8dc0db5bd3..0a7f7c290644 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -119,7 +119,7 @@ static u32 audit_rate_limit;
* When set to zero, this means unlimited. */
static u32 audit_backlog_limit = 64;
#define AUDIT_BACKLOG_WAIT_TIME (60 * HZ)
-static u32 audit_backlog_wait_time = AUDIT_BACKLOG_WAIT_TIME;
+static u32 audit_backlog_wait_time = 0;
/* The identity of the user shutting down the audit system. */
kuid_t audit_sig_uid = INVALID_UID;
-RongQing