On 2016-12-12 15:18, Paul Moore wrote:
 > On Mon, Dec 12, 2016 at 5:03 AM, Richard Guy Briggs <rgb(a)redhat.com> wrote:
 > > Resetting audit_sock appears to be racy.
 > >
 > > audit_sock was being copied and dereferenced without using a refcount on
 > > the source sock.
 > >
 > > Bump the refcount on the underlying sock when we store a refrence in
 > > audit_sock and release it when we reset audit_sock.  audit_sock
 > > modification needs the audit_cmd_mutex.
 > >
 > > See: 
https://lkml.org/lkml/2016/11/26/232
 > >
 > > Thanks to Eric Dumazet <edumazet(a)google.com> and Cong Wang
 > > <xiyou.wangcong(a)gmail.com> on ideas how to fix it.
 > >
 > > Signed-off-by: Richard Guy Briggs <rgb(a)redhat.com>
 > > ---
 > > There has been a lot of change in the audit code that is about to go
 > > upstream to address audit queue issues.  This patch is based on the
 > > source tree: 
git://git.infradead.org/users/pcmoore/audit#next
 > > ---
 > >  kernel/audit.c |   34 ++++++++++++++++++++++++++++------
 > >  1 files changed, 28 insertions(+), 6 deletions(-)
 > > diff --git a/kernel/audit.c b/kernel/audit.c
 > > index f20eee0..439f7f3 100644
 > > --- a/kernel/audit.c
 > > +++ b/kernel/audit.c
 > > @@ -452,7 +452,9 @@ static void auditd_reset(void)
 > >         struct sk_buff *skb;
 > >
 > >         /* break the connection */
 > > +       sock_put(audit_sock);
 > >         audit_pid = 0;
 > > +       audit_nlk_portid = 0;
 > >         audit_sock = NULL;
 > >
 > >         /* flush all of the retry queue to the hold queue */
 > > @@ -478,6 +480,12 @@ static int kauditd_send_unicast_skb(struct sk_buff *skb)
 > >         if (rc >= 0) {
 > >                 consume_skb(skb);
 > >                 rc = 0;
 > > +       } else {
 > > +               if (rc & (-ENOMEM|-EPERM|-ECONNREFUSED)) {
 > 
 > I dislike the way you wrote this because instead of simply looking at
 > this to see if it correct I need to sort out all the bits and find out
 > if there are other error codes that could run afoul of this check ...
 > make it simple, e.g. (rc == -ENOMEM || rc == -EPERM || ...).
 > Actually, since EPERM is 1, -EPERM (-1 in two's compliment is
 > 0xffffffff) is going to cause this to be true for pretty much any
 > value of rc, yes?
 
 Yes, you are correct.  We need there a logical or on the results of each
 comparison to the return code rather than bit-wise or-ing the result
 codes together first to save a step.
 
 > > +                       mutex_lock(&audit_cmd_mutex);
 > > +                       auditd_reset();
 > > +                       mutex_unlock(&audit_cmd_mutex);
 > > +               }
 > 
 > The code in audit#next handles netlink_unicast() errors in
 > kauditd_thread() and you are adding error handling code here in
 > kauditd_send_unicast_skb() ... that's messy.  I don't care too much
 > where the auditd_reset() call is made, but let's only do it in one
 > function; FWIW, I originally put the error handling code in
 > kauditd_thread() because there was other error handling code that
 > needed to done in that scope so it resulted in cleaner code.
 
 Hmmm, I seem to remember it not returning the return code and I thought
 I had changed it to do so, but I see now that it was already there.
 Agreed, I needlessly duplicated that error handling.
 
 > Related, I see you are now considering ENOMEM to be a fatal condition,
 > that differs from the AUDITD_BAD macro in kauditd_thread(); this
 > difference needs to be reconciled.
 
 Also correct about -EPERM now that I check back to the intent of commit
 32a1dbaece7e ("audit: try harder to send to auditd upon netlink
 failure")
 
 > Finally, you should update the comment header block for auditd_reset()
 > that it needs to be called with the audit_cmd_mutex held.
 
 Yup.
 
 > > @@ -1004,17 +1018,22 @@ static int audit_receive_msg(struct sk_buff *skb,
struct nlmsghdr *nlh)
 > >                                 return -EACCES;
 > >                         }
 > >                         if (audit_pid && new_pid &&
 > > -                           audit_replace(requesting_pid) != -ECONNREFUSED) {
 > > +                           (audit_replace(requesting_pid) &
(-ECONNREFUSED|-EPERM|-ENOMEM))) {
 > 
 > Do we simply want to treat any error here as fatal, and not just
 > ECONN/EPERM/ENOMEM?  If not, let's come up with a single macro to
 > handle the fatal netlink_unicast() return codes so we have some chance
 > to keep things consistent in the future.
 
 I'll work through this before I post another patch... 
Ok, I've gone back to look at the reasoning in commit 133e1e5acd4a
("audit: stop an old auditd being starved out by a new auditd") which
suggests only ECONNREFUSED can cause an audit_pid replace, so I've
returned it to its original state.
I'll post another tested patch, but I'm still not that happy that it
does not proactively reset audit_pid, audit_nlk_portid and audit_sock
when auditd's socket has a problem.  I'll leave the test run overnight.
- RGB
--
Richard Guy Briggs <rgb(a)redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635