答复: [PATCH][RFC] audit: set wait time to zero when audit failed

Wednesday, 18 September 2019

...
 -----邮件原件-----
 发件人: Paul Moore [mailto:paul@paul-moore.com]
 发送时间: 2019年9月18日 20:23
 收件人: Li,Rongqing <lirongqing(a)baidu.com&gt;
 抄送: Eric Paris <eparis(a)redhat.com>; linux-audit(a)redhat.com
 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
 
 On Tue, Sep 17, 2019 at 9:07 PM Li,Rongqing <lirongqing(a)baidu.com&gt; wrote:
 > > -----邮件原件-----
 > > 发件人: Paul Moore [mailto:paul@paul-moore.com]
 > > 发送时间: 2019年9月18日 3:17
 > > 收件人: Li,Rongqing <lirongqing(a)baidu.com&gt;
 > > 抄送: Eric Paris <eparis(a)redhat.com>; linux-audit(a)redhat.com
 > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit failed
 > >
 > > On Mon, Sep 16, 2019 at 9:08 PM Li,Rongqing <lirongqing(a)baidu.com&gt;
 wrote:
 > > > > -----邮件原件-----
 > > > > 发件人: Paul Moore [mailto:paul@paul-moore.com]
 > > > > 发送时间: 2019年9月17日 6:52
 > > > > 收件人: Li,Rongqing <lirongqing(a)baidu.com&gt;
 > > > > 抄送: Eric Paris <eparis(a)redhat.com>; linux-audit(a)redhat.com
 > > > > 主题: Re: [PATCH][RFC] audit: set wait time to zero when audit
 > > > > failed
 
 ...
 
 > > > I just want to it as before 3197542482df ("audit: rework
 > > > audit_log_start()"), wait 60 seconds once if
 > > > auditd/readaheaad-collector have some problem to drain the audit
 backlog.
 > >
 > > The patch you mention fixed what was deemed to be buggy behavior; as
 > > mentioned previously in this thread I see no good reason to go back
 > > to the old behavior.
 > >
 > > > > If you are not using audit, you can always disable it via the
 > > > > kernel command line, or at runtime (look at what Fedora does).
 > > > >
 > > > > > > You might also want to investigate what is generating some
 > > > > > > many audit records prior to starting the audit daemon.
 > > > > >
 > > > > > It is /sbin/readahead-collector, in fact, we stop the auditd;
 > > > > > We are doing a
 > > > > reboot test, which rebooting machine continue to test
 hardware/software.
 > > > > >
 > > > > > it is same as below:
 > > > > > auditctl -a always,exit -S all -F pid='xxx'
 > > > > > kill -s 19 `pidof auditd`
 > > > > >
 > > > > > then the audited task will be hung
 > > > >
 > > > > So you are seeing this problem only when you run a test, or did
 > > > > you provide this as a reproducer?
 > > >
 > > > auditctl -a always,exit -S all -F ppid=`pidof sshd` kill -s 19
 > > > `pidof auditd` ssh root(a)127.0.0.1
 > > >
 > > > then ssh will be hung forever
 > >
 > > That is expected behavior.  You are putting a massive audit load on
 > > the system by telling the kernel to audit every syscall that sshd
 > > makes, then you are intentionally killing the audit daemon and attempting
 to ssh into the system.
 > > The proper fix(es) here would be to 1) set reasonable audit rules
 > > and/or 2) use an init system that monitors and restarts auditd when
 > > it fails (systemd has this capability, I believe some others do as well).
 >
 > Both are not working.
 > The auditd is not dead, it is in stop status(kill -s 19). So systemd/init will not
 restart it.
 > Even if with little audit rules, after multiple accesses, the backlog
 > will full due to no receiver
 
 Fair point, however I still stand by my previous comments that there are
 runtime configuration knobs which can mitigate this problem if it is something
 you are concerned about.  Depending on the situation, you can either increase
 the backlog to deal with transient problems, or decrease the backlog wait time
 (possibly to zero) to prevent blocking entirely.
  
No need knobs, auditctl can change the backlog length and wait time. And it is helpless to
change the backlog length if auditd is hung forever, as a task can be hung forever due to
disk/filesystem's abnormal, etc

I am saying the audit default behaviors which is changed, I truly meet the issue as
description of the below commit, if we can make change, other can avoid this issue.

commit ac4cec443a80bfde829516e7a7db10f7325aa528
Author: David Woodhouse <dwmw2(a)shinybook.infradead.org&gt;
Date:   Sat Jul 2 14:08:48 2005 +0100

    AUDIT: Stop waiting for backlog after audit_panic() happens
    
    We force a rate-limit on auditable events by making them wait for space
    on the backlog queue. However, if auditd really is AWOL then this could
    potentially bring the entire system to a halt, depending on the audit
    rules in effect.


Other method to avoid this issue to make audit_backlog_wait_time as 0 by default

diff --git a/kernel/audit.c b/kernel/audit.c
index da8dc0db5bd3..0a7f7c290644 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -119,7 +119,7 @@ static u32  audit_rate_limit;
  * When set to zero, this means unlimited. */
 static u32     audit_backlog_limit = 64;
 #define AUDIT_BACKLOG_WAIT_TIME (60 * HZ)
-static u32     audit_backlog_wait_time = AUDIT_BACKLOG_WAIT_TIME;
+static u32     audit_backlog_wait_time = 0;
 
 /* The identity of the user shutting down the audit system. */
 kuid_t         audit_sig_uid = INVALID_UID;


-RongQing


...
 --
 paul moore
 www.paul-moore.com 

    

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

答复: [PATCH][RFC] audit: set wait time to zero when audit failed