Re: [RFC PATCH] audit-testsuite: improve our chances of losing records in lost_reset

Friday, 14 December 2018

On 2018-12-14 15:35, Paul Moore wrote:
...
 On Fri, Dec 14, 2018 at 11:12 AM Richard Guy Briggs
<rgb(a)redhat.com&gt; wrote:
 > On 2018-12-14 10:53, Paul Moore wrote:
 > > On Thu, Dec 13, 2018 at 8:59 PM Richard Guy Briggs <rgb(a)redhat.com&gt;
wrote:
 > > > On 2018-12-13 18:23, Paul Moore wrote:
 > > > > On Thu, Dec 13, 2018 at 6:17 PM Paul Moore
<paul(a)paul-moore.com&gt; wrote:

 ...

 > > As an aside, have you spent any time trying to debug that wrong PID
 > > problem?  While not strictly audit related, that seems like a pretty
 > > serious Bash bug.  Or maybe it's a problem with the test?  I vaguely
 > > remember a discussion between you and Ondrej and some confusion around
 > > which Bash variable to use to fetch PIDs, but I may be mistaken.
 >
 > I haven't spent much time trying to debug that bash PID increment issue,
 > but it perplexed me since it was the identical technique used in
 > multiple tests in the audit-testsuite that has never caused an issue
 > previously on any of the same set of test machines.  This was for the
 > missing mount umount2 hang bug test
 > https://github.com/linux-audit/audit-testsuite/pull/76

 Ah, I think I see the problem.  Unless I'm mistaken, you are talking
 about the shell/Bash command where the tests print the PID using "echo
 $$" or similar, yes?  As you likely already know, in Bash (and likely
 other shells as well), $$ is the PID of the running Bash process; in
 the two places where I saw it used in the existing audit-testsuite $$
 is being used to reference the Bash instance itself or something it
 exec's (which ends up inheriting the PID).  It looks like you are
 using $$ as a way to capture the PID of a child process being spawned
 by the shell, yes?  This may explain why you sometimes get lucky and
 $$+1 works for the PID. 
I was always getting lucky with $$+1, but understandably uncomfortable
with it since others weren't so fortunate.

In the code that's there, that process is backgrounded, but I'm fairly
certain I tested without that and carefully checked the options (-f and
-s) to ensure it wasn't daemonizing or multithreading.  I was pretty
careful to set up exactly the same conditions for running that process
as other tests use, but that looks like the first thing to check next
time I try it.  It wouldn't be the first time I've missed something
obvious.

...
 > > This brings up to the next step: how do we want to address
this?
 > >
 > > Prior to the queue rework that started in v4.10 things were a bit
 > > simpler and it looks like we always registered a lost record
 > > independent of the "audit=?" setting on the kernel command line and
 > > the current queue backlog.  While this would have made this test
 > > easier, it could result in some over counting problems in the cases
 > > where an auditd instance came along and read the "lost" records from
 > > the queue.  I don't think reverting to this behavior is ideal.
 > >
 > > I'm also not certain that recording lost records in the *not*
 > > "audit=1" case is a good solution either.  In the case where the
 > > system is not running an audit daemon they are almost guaranteed to
 > > hit the backlog at some point and then their system will start spewing
 > > scary looking audit log lost messages; we would surely win lots of
 > > friends this way.
 > >
 > > We could move the "audit=1" check (really it's an audit_default
check)
 > > into audit_log_lost() and use it to squelch the printk() and then call
 > > audit_log_lost in kauditd_hold_skb() if we can't queue the record.
 > > This should preserve an accurate lost record count (well, until it
 > > rolls over, but that's not a new concern), prevent unnecessary scary
 > > lost record messages, and ensure a consistent audit_log_lost()
 > > operation (all the other callers I didn't mention in this mail).
 > >
 > > Or the simplest option is to just ignore this and require that the
 > > audit-testsuite be run on a system booted with "audit=1" :)
 > >
 > > I'm currently leaning towards moving the "audit=1" check into
 > > audit_log_lost(), what do you guys think?
 >
 > I'll need some time to digest all this.  My first reaction is that
 > requiring "audit=1" for the audit-testsuite is not the right answer and
 > as we've discussed previously, it makes sense to run the entire
 > testsuite both with and without "audit=1".

 Yes, requiring "audit=1" is not an answer.  I put it there mostly as
 an attempt at humor, hence the smiley at the end, but clearly my
 abilities as a comic are lacking ;) 
Sorry, email is limited...  I read :) and ;) differently.

...
 Anyway, no rush on looking it over; this isn't a -stable fix in
my
 eyes so it's not a candidate for merging until after the upcoming
 merge window.  I think it will make a lot more sense once you go back
 and look at the code, reading about the problem in email probably
 makes is sound more complicated than it is.  I'm fairly certain the
 right solution is to move the "audit=1" check into audit_log_lost(),
 but I sent this out in case you or anyone else could think of a better
 solution.  In the meantime I'll throw together a quick patch (it
 should be pretty small) and add it to my testing queue. 
I'll look at the code and your patch carefully when I can.

...
 paul moore 
- RGB

--
Richard Guy Briggs <rgb(a)redhat.com&gt;
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [RFC PATCH] audit-testsuite: improve our chances of losing records in lost_reset