On Fri, Dec 14, 2018 at 10:00 PM Richard Guy Briggs <rgb(a)redhat.com> wrote:
On 2018-12-14 15:35, Paul Moore wrote:
> On Fri, Dec 14, 2018 at 11:12 AM Richard Guy Briggs <rgb(a)redhat.com> wrote:
> > On 2018-12-14 10:53, Paul Moore wrote:
> > > On Thu, Dec 13, 2018 at 8:59 PM Richard Guy Briggs <rgb(a)redhat.com>
wrote:
> > > > On 2018-12-13 18:23, Paul Moore wrote:
> > > > > On Thu, Dec 13, 2018 at 6:17 PM Paul Moore
<paul(a)paul-moore.com> wrote:
>
> ...
>
> > > As an aside, have you spent any time trying to debug that wrong PID
> > > problem? While not strictly audit related, that seems like a pretty
> > > serious Bash bug. Or maybe it's a problem with the test? I vaguely
> > > remember a discussion between you and Ondrej and some confusion around
> > > which Bash variable to use to fetch PIDs, but I may be mistaken.
> >
> > I haven't spent much time trying to debug that bash PID increment issue,
> > but it perplexed me since it was the identical technique used in
> > multiple tests in the audit-testsuite that has never caused an issue
> > previously on any of the same set of test machines. This was for the
> > missing mount umount2 hang bug test
> >
https://github.com/linux-audit/audit-testsuite/pull/76
>
> Ah, I think I see the problem. Unless I'm mistaken, you are talking
> about the shell/Bash command where the tests print the PID using "echo
> $$" or similar, yes? As you likely already know, in Bash (and likely
> other shells as well), $$ is the PID of the running Bash process; in
> the two places where I saw it used in the existing audit-testsuite $$
> is being used to reference the Bash instance itself or something it
> exec's (which ends up inheriting the PID). It looks like you are
> using $$ as a way to capture the PID of a child process being spawned
> by the shell, yes? This may explain why you sometimes get lucky and
> $$+1 works for the PID.
I was always getting lucky with $$+1, but understandably uncomfortable
with it since others weren't so fortunate.
In the code that's there, that process is backgrounded, but I'm fairly
certain I tested without that and carefully checked the options (-f and
-s) to ensure it wasn't daemonizing or multithreading. I was pretty
careful to set up exactly the same conditions for running that process
as other tests use, but that looks like the first thing to check next
time I try it. It wouldn't be the first time I've missed something
obvious.
Right, let me chime into the PID issue again :) I am pretty sure the
code to get the child PID in Richard's umount2 test is wrong. I didn't
get back to that thread eventually because I tried to simplify the
fixed code to something better (and still 100% reliable), but I
couldn't get it to a state I'd like and eventually I forgot about it
and switched to other things...
So, let me try to explain the problem again. This is the relevant
snippet of code:
system("cd $basedir/$clientdir; echo \$\$ > $stdout; exec ./$client -f
-s $tmpdir &");
Let's rewrite this into pseudocode so it is clear what's actually going on:
1. run "cd $basedir/$clientdir"
2. write the PID of this bash process into "$stdout"
3. fork a child and in that child, execve() this command: "./$client
-f -s $tmpdir"
So the problem is that you log the PID of the parent bash process and
not the child that exec's into the "$client" program. Since the child
gets forked away very quickly after the parent bash process is
created, in 99.9% of the cases it gets a PID exactly one greater -
that's why your +1 trick works. The reason why a similar pattern
("echo $$; exec ...") works elsewhere in the testsuite is that there
is no forking going on in those cases (IIRC).
I just looked at the code in the lost_reset test that actually deals
with the forking scenario, and it uses the "$!" variable, which is the
correct way to get the PID of the last process forked by bash. I admit
I didn't know about this variable when I was pointing out the problem
in Richard's patch, but now I realize this is what I should have
suggested back then...
So, in the umount2 test this:
system("cd $basedir/$clientdir; echo \$\$ > $stdout; exec ./$client -f
-s $tmpdir &");
should be replaced with this (along with dropping the "$pid_fuse += 1;" line):
system("cd $basedir/$clientdir; exec ./$client -f -s $tmpdir & echo
\$! > $stdout;");
That said, I think the code in the lost_reset test is doing the right
thing and I wouldn't expect it to get the ping PID wrong.
Hope that helps,
--
Ondrej Mosnacek <omosnace at redhat dot com>
Associate Software Engineer, Security Technologies
Red Hat, Inc.