On 15/09/04, Paul Moore wrote:
On Friday, September 04, 2015 05:14:54 AM Richard Guy Briggs wrote:
> There are several reports of the kernel losing contact with auditd ...
Even if this doesn't completely solve the problem, I like the extra reporting
and robustness of this change. Some comments inline ...
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 1c13e42..4ee114a 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -404,19 +404,54 @@ static void audit_printk_skb(struct sk_buff *skb)
> audit_hold_skb(skb);
> }
>
> +static char *audit_strerror(int err)
> +{
> + switch (err) {
> + case -ECONNREFUSED:
> + return "ECONNREFUSED";
> + case -EPERM:
> + return "EPERM";
> + case -ENOMEM:
> + return "ENOMEM";
> + case -EAGAIN:
> + return "EAGAIN";
> + case -ERESTARTSYS:
> + return "ERESTARTSYS";
> + case -EINTR:
> + return "EINTR";
> + default:
> + return "(other)";
> + }
> +}
See comments below.
> static void kauditd_send_skb(struct sk_buff *skb)
> {
> int err;
> + int attempts = 0;
> +#define AUDITD_RETRIES 5
> +
> +restart:
> /* take a reference in case we can't send it and we want to hold it */
> skb_get(skb);
Should the restart label go after the skb_get() call? It seems like if we
ever jump to restart we could end up needlessly bumping the skb's refcnt.
I checked specifically for this and all the paths through
netlink_unicast() that return an error consume the skb. In fact, all
paths through netlink_unicast() consume the skb.
> err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
> if (err < 0) {
> BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
> + pr_err("netlink_unicast sending to audit_pid=%d returned error: %d,
%s\n"
> + , audit_pid, err, audit_strerror(err));
This is a style nit, but please put the comma on the preceding line. I know
why you did it, but it bothers me.
I blame Hugh Redelmeier with whom I worked on the FreeS/WAN project, who
has been heavily involved with the ANSI C standards committee. ;-) I
should be sticking with the prevailing standard of the existing code.
I'm also debating if audit_strerror() is worth it. I agree with
you that it
is a good idea to indicate the specific error code, I'm just not sure we need
to bother translating that into a proper errno name. Can you think of some
reason why we would need the errno name as opposed to the integer?
Other than convenience, not really. The only reason other than that
would be if an unexpected error code is returned it will be mover
obvious to me when reviewing a user report. Other than that, it is
complete without the text.
> if (audit_pid) {
> - pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
> - audit_log_lost("auditd disappeared");
> - audit_pid = 0;
> - audit_sock = NULL;
> + if (err == -ECONNREFUSED || err == -EPERM
> + || ++attempts >= AUDITD_RETRIES) {
> + audit_log_lost("audit_pid=%d reset");
> + audit_pid = 0;
> + audit_sock = NULL;
> + } else {
> + pr_warn("re-scheduling(#%d) write to audit_pid=%d\n"
> + , attempts, audit_pid);
Same thing with the comma.
> + set_current_state(TASK_INTERRUPTIBLE);
> + schedule();
> + __set_current_state(TASK_RUNNING);
> + goto restart;
See my comment above about the skb's reference count.
> + }
> }
> /* we might get lucky and get this in the next auditd */
> audit_hold_skb(skb);
--
paul moore
www.paul-moore.com
- RGB
--
Richard Guy Briggs <rbriggs(a)redhat.com>
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545