Re: [PATCH V1] audit: try harder to send to auditd upon netlink failure

Monday, 7 September 2015

On 15/09/04, Paul Moore wrote:
...
 On Friday, September 04, 2015 05:14:54 AM Richard Guy Briggs wrote:
 > There are several reports of the kernel losing contact with auditd ...

 Even if this doesn't completely solve the problem, I like the extra reporting 
 and robustness of this change.  Some comments inline ...

 > diff --git a/kernel/audit.c b/kernel/audit.c
 > index 1c13e42..4ee114a 100644
 > --- a/kernel/audit.c
 > +++ b/kernel/audit.c
 > @@ -404,19 +404,54 @@ static void audit_printk_skb(struct sk_buff *skb)
 >  	audit_hold_skb(skb);
 >  }
 > 
 > +static char *audit_strerror(int err)
 > +{
 > +	switch (err) {
 > +	case -ECONNREFUSED:
 > +		return "ECONNREFUSED";
 > +	case -EPERM:
 > +		return "EPERM";
 > +	case -ENOMEM:
 > +		return "ENOMEM";
 > +	case -EAGAIN:
 > +		return "EAGAIN";
 > +	case -ERESTARTSYS:
 > +		return "ERESTARTSYS";
 > +	case -EINTR:
 > +		return "EINTR";
 > +	default:
 > +		return "(other)";
 > +	}
 > +}

 See comments below.

 >  static void kauditd_send_skb(struct sk_buff *skb)
 >  {
 >  	int err;
 > +	int attempts = 0;
 > +#define AUDITD_RETRIES 5
 > +
 > +restart:
 >  	/* take a reference in case we can't send it and we want to hold it */
 >  	skb_get(skb);

 Should the restart label go after the skb_get() call?  It seems like if we 
 ever jump to restart we could end up needlessly bumping the skb's refcnt. 
I checked specifically for this and all the paths through
netlink_unicast() that return an error consume the skb.  In fact, all
paths through netlink_unicast() consume the skb.

...
 >  	err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
 >  	if (err < 0) {
 >  		BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
 > +		pr_err("netlink_unicast sending to audit_pid=%d returned error: %d, 
 %s\n"
 > +		       , audit_pid, err, audit_strerror(err));

 This is a style nit, but please put the comma on the preceding line.  I know 
 why you did it, but it bothers me. 
I blame Hugh Redelmeier with whom I worked on the FreeS/WAN project, who
has been heavily involved with the ANSI C standards committee.  ;-)    I
should be sticking with the prevailing standard of the existing code.

...
 I'm also debating if audit_strerror() is worth it.  I agree with
you that it 
 is a good idea to indicate the specific error code, I'm just not sure we need 
 to bother translating that into a proper errno name.  Can you think of some 
 reason why we would need the errno name as opposed to the integer? 
Other than convenience, not really.  The only reason other than that
would be if an unexpected error code is returned it will be mover
obvious to me when reviewing a user report.  Other than that, it is
complete without the text.

...
 >  		if (audit_pid) {
 > -			pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
 > -			audit_log_lost("auditd disappeared");
 > -			audit_pid = 0;
 > -			audit_sock = NULL;
 > +			if (err == -ECONNREFUSED || err == -EPERM
 > +			    || ++attempts >= AUDITD_RETRIES) {
 > +				audit_log_lost("audit_pid=%d reset");
 > +				audit_pid = 0;
 > +				audit_sock = NULL;
 > +			} else {
 > +				pr_warn("re-scheduling(#%d) write to audit_pid=%d\n"
 > +					, attempts, audit_pid);

 Same thing with the comma.

 > +				set_current_state(TASK_INTERRUPTIBLE);
 > +				schedule();
 > +				__set_current_state(TASK_RUNNING);
 > +				goto restart;

 See my comment above about the skb's reference count.

 > +			}
 >  		}
 >  		/* we might get lucky and get this in the next auditd */
 >  		audit_hold_skb(skb);

 -- 
 paul moore
 www.paul-moore.com

- RGB

--
Richard Guy Briggs <rbriggs(a)redhat.com&gt;
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [PATCH V1] audit: try harder to send to auditd upon netlink failure