On 2016-12-13 16:17, Cong Wang wrote:
On Tue, Dec 13, 2016 at 2:52 AM, Richard Guy Briggs
<rgb(a)redhat.com> wrote:
> It is actually the audit_pid and audit_nlk_portid that I care about
> more. The audit daemon could vanish or close the socket while the
> kernel sock to which it was attached is still quite valid. Accessing
> the set of three atomically is the urge. I wonder if it makes more
> sense to test for the presence of auditd using audit_sock rather than
> audit_pid, but still keep audit_pid for our reporting and replacement
> strategy. Another idea would be to put the three in one struct.
Note, the process has audit_pid should hold a refcnt to the netns too,
so the netns can't be gone until that process is gone.
I noted that. I did wonder if there might be a problem if all the
processes were moved to another netns with the struct sock stuck in the
now process-void netns.
This is alluded-to in 6f285b19d09f ("audit: Send replies in the proper
network namespace.").
> Can someone explain how they think the original test was able to
trigger
> this GPF? Network namespace shutdown while something pretended to set
> up a new auditd? That's impressive for a fuzzer if that's the case...
> Is there an strace? I guess it is all in test().
I am surprised you still don't get the race condition even when you
are now working on v2...
The race happens in this scenarios :
1) Create a new netns
2) In the new netns, communicate with kauditd to set audit_sock
3) Generate some audit messages, so kauditd will keep sending them
via audit_sock
4) exit the netns
5) the previous audit_sock is now going away, but kaudit_sock could still
access it in this small window.
Ah ok that fits...
- RGB
--
Richard Guy Briggs <rgb(a)redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635