On 2016-12-13 16:17, Cong Wang wrote:
 On Tue, Dec 13, 2016 at 2:52 AM, Richard Guy Briggs
<rgb(a)redhat.com> wrote:
 > It is actually the audit_pid and audit_nlk_portid that I care about
 > more.  The audit daemon could vanish or close the socket while the
 > kernel sock to which it was attached is still quite valid.  Accessing
 > the set of three atomically is the urge.  I wonder if it makes more
 > sense to test for the presence of auditd using audit_sock rather than
 > audit_pid, but still keep audit_pid for our reporting and replacement
 > strategy.  Another idea would be to put the three in one struct.
 
 Note, the process has audit_pid should hold a refcnt to the netns too,
 so the netns can't be gone until that process is gone. 
I noted that.  I did wonder if there might be a problem if all the
processes were moved to another netns with the struct sock stuck in the
now process-void netns.
This is alluded-to in 6f285b19d09f ("audit: Send replies in the proper
network namespace.").
 > Can someone explain how they think the original test was able to
trigger
 > this GPF?  Network namespace shutdown while something pretended to set
 > up a new auditd?  That's impressive for a fuzzer if that's the case...
 > Is there an strace?  I guess it is all in test().
 
 I am surprised you still don't get the race condition even when you
 are now working on v2...
 
 The race happens in this scenarios :
 
 1) Create a new netns
 
 2) In the new netns, communicate with kauditd to set audit_sock
 
 3) Generate some audit messages, so kauditd will keep sending them
 via audit_sock
 
 4) exit the netns
 
 5) the previous audit_sock is now going away, but kaudit_sock could still
 access it in this small window. 
Ah ok that fits...
- RGB
--
Richard Guy Briggs <rgb(a)redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635