On Tue, Dec 13, 2016 at 2:52 AM, Richard Guy Briggs <rgb(a)redhat.com> wrote:
It is actually the audit_pid and audit_nlk_portid that I care about
more. The audit daemon could vanish or close the socket while the
kernel sock to which it was attached is still quite valid. Accessing
the set of three atomically is the urge. I wonder if it makes more
sense to test for the presence of auditd using audit_sock rather than
audit_pid, but still keep audit_pid for our reporting and replacement
strategy. Another idea would be to put the three in one struct.
Note, the process has audit_pid should hold a refcnt to the netns too,
so the netns can't be gone until that process is gone.
Can someone explain how they think the original test was able to trigger
this GPF? Network namespace shutdown while something pretended to set
up a new auditd? That's impressive for a fuzzer if that's the case...
Is there an strace? I guess it is all in test().
I am surprised you still don't get the race condition even when you
are now working on v2...
The race happens in this scenarios :
1) Create a new netns
2) In the new netns, communicate with kauditd to set audit_sock
3) Generate some audit messages, so kauditd will keep sending them
via audit_sock
4) exit the netns
5) the previous audit_sock is now going away, but kaudit_sock could still
access it in this small window.