Re: data-race in audit_log_start / audit_receive

Friday, 19 August 2022

On Thu, Aug 18, 2022 at 9:59 PM Paul Moore <paul(a)paul-moore.com&gt; wrote:
...
 On Thu, Aug 18, 2022 at 6:23 PM Abhishek Shah
 <abhishek.shah(a)columbia.edu&gt; wrote:
 > Hi all,
 >
 > We found a data race involving the audit_cmd_mutex.owner variable. We think this bug
is concerning because audit_ctl_owner_current is used at a location that controls the
scheduling of tasks shown here. Please let us know what you think.
 >
 > Thanks!
 >
 > -----------------Report----------------------
 >
 > write to 0xffffffff881d0710 of 8 bytes by task 6541 on cpu 0:
 >  audit_ctl_lock kernel/audit.c:237 [inline]

 ...

 > read to 0xffffffff881d0710 of 8 bytes by task 6542 on cpu 1:
 >  audit_ctl_owner_current kernel/audit.c:258 [inline]

 Yes, technically there is a race condition if/when an auditd instance
 is registering itself the exact same time as another task is
 attempting to log an audit record via audit_log_start(). 
I realized after I sent this and turned off my computer last night
that I typed the wrong thing - the race isn't between auditd and
audit_log_start(), it's between the code which changes the audit
subsystem state (see audit_receive() and the audit watch/tree code)
and audit_log_start().

...
 The risk
 being that a *very* limited number of audit records could be
 mis-handled with respect to their queue priority and that is it; no
 records would be lost or misplaced.  Correcting this would likely
 involve a more complex locking scheme[1] or a rather severe
 performance penalty due to an additional lock in the audit_log_start()
 code path.  There may be some value in modifying
 audit_ctl_owner_current() to use READ_ONCE(), but it isn't clear to me
 that this would significantly improve things or have no impact on
 performance. 
Another thing I thought of last night - I don't believe READ_ONCE()
adds a memory barrier, which would probably be needed; although my
original statement still stands, I'm not sure the performance hit
would justify the marginal impact on the audit queue.

...
 Have you noticed any serious problems on your system due to this? 
If
 you have a reproducer which shows actual harm on the system could you
 please share that?

 [1] The obvious choice would be to move to a RCU based scheme, but
 even that doesn't totally solve the problem as there would still be a
 window where some tasks would have an "old" value.  It might actually
 end up extending the race window on large multi-core systems due to
 the time needed for all of the critical sections to complete. 
-- 
paul-moore.com

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: data-race in audit_log_start / audit_receive