Re: [RFC PATCH] audit-testsuite: improve our chances of losing records in lost_reset

Saturday, 15 December 2018

On Fri, Dec 14, 2018 at 10:00 PM Richard Guy Briggs <rgb(a)redhat.com&gt; wrote:
...
 On 2018-12-14 15:35, Paul Moore wrote:
 > On Fri, Dec 14, 2018 at 11:12 AM Richard Guy Briggs <rgb(a)redhat.com&gt; wrote:
 > > On 2018-12-14 10:53, Paul Moore wrote:
 > > > On Thu, Dec 13, 2018 at 8:59 PM Richard Guy Briggs <rgb(a)redhat.com&gt;
wrote:
 > > > > On 2018-12-13 18:23, Paul Moore wrote:
 > > > > > On Thu, Dec 13, 2018 at 6:17 PM Paul Moore
<paul(a)paul-moore.com&gt; wrote:
 >
 > ...
 >
 > > > As an aside, have you spent any time trying to debug that wrong PID
 > > > problem?  While not strictly audit related, that seems like a pretty
 > > > serious Bash bug.  Or maybe it's a problem with the test?  I vaguely
 > > > remember a discussion between you and Ondrej and some confusion around
 > > > which Bash variable to use to fetch PIDs, but I may be mistaken.
 > >
 > > I haven't spent much time trying to debug that bash PID increment issue,
 > > but it perplexed me since it was the identical technique used in
 > > multiple tests in the audit-testsuite that has never caused an issue
 > > previously on any of the same set of test machines.  This was for the
 > > missing mount umount2 hang bug test
 > > https://github.com/linux-audit/audit-testsuite/pull/76
 >
 > Ah, I think I see the problem.  Unless I'm mistaken, you are talking
 > about the shell/Bash command where the tests print the PID using "echo
 > $$" or similar, yes?  As you likely already know, in Bash (and likely
 > other shells as well), $$ is the PID of the running Bash process; in
 > the two places where I saw it used in the existing audit-testsuite $$
 > is being used to reference the Bash instance itself or something it
 > exec's (which ends up inheriting the PID).  It looks like you are
 > using $$ as a way to capture the PID of a child process being spawned
 > by the shell, yes?  This may explain why you sometimes get lucky and
 > $$+1 works for the PID.

 I was always getting lucky with $$+1, but understandably uncomfortable
 with it since others weren't so fortunate.

 In the code that's there, that process is backgrounded, but I'm fairly
 certain I tested without that and carefully checked the options (-f and
 -s) to ensure it wasn't daemonizing or multithreading.  I was pretty
 careful to set up exactly the same conditions for running that process
 as other tests use, but that looks like the first thing to check next
 time I try it.  It wouldn't be the first time I've missed something
 obvious. 
Right, let me chime into the PID issue again :) I am pretty sure the
code to get the child PID in Richard's umount2 test is wrong. I didn't
get back to that thread eventually because I tried to simplify the
fixed code to something better (and still 100% reliable), but I
couldn't get it to a state I'd like and eventually I forgot about it
and switched to other things...

So, let me try to explain the problem again. This is the relevant
snippet of code:

system("cd $basedir/$clientdir; echo \$\$ > $stdout; exec ./$client -f
-s $tmpdir &");

Let's rewrite this into pseudocode so it is clear what's actually going on:
1. run "cd $basedir/$clientdir"
2. write the PID of this bash process into "$stdout"
3. fork a child and in that child, execve() this command: "./$client
-f -s $tmpdir"

So the problem is that you log the PID of the parent bash process and
not the child that exec's into the "$client" program. Since the child
gets forked away very quickly after the parent bash process is
created, in 99.9% of the cases it gets a PID exactly one greater -
that's why your +1 trick works. The reason why a similar pattern
("echo $$; exec ...") works elsewhere in the testsuite is that there
is no forking going on in those cases (IIRC).

I just looked at the code in the lost_reset test that actually deals
with the forking scenario, and it uses the "$!" variable, which is the
correct way to get the PID of the last process forked by bash. I admit
I didn't know about this variable when I was pointing out the problem
in Richard's patch, but now I realize this is what I should have
suggested back then...

So, in the umount2 test this:

system("cd $basedir/$clientdir; echo \$\$ > $stdout; exec ./$client -f
-s $tmpdir &");

should be replaced with this (along with dropping the "$pid_fuse += 1;" line):

system("cd $basedir/$clientdir; exec ./$client -f -s $tmpdir & echo
\$! > $stdout;");

That said, I think the code in the lost_reset test is doing the right
thing and I wouldn't expect it to get the ping PID wrong.

Hope that helps,

--
Ondrej Mosnacek <omosnace at redhat dot com>
Associate Software Engineer, Security Technologies
Red Hat, Inc.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [RFC PATCH] audit-testsuite: improve our chances of losing records in lost_reset