On Friday, September 04, 2015 05:14:54 AM Richard Guy Briggs wrote:
There are several reports of the kernel losing contact with auditd
...
Even if this doesn't completely solve the problem, I like the extra reporting
and robustness of this change. Some comments inline ...
diff --git a/kernel/audit.c b/kernel/audit.c
index 1c13e42..4ee114a 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -404,19 +404,54 @@ static void audit_printk_skb(struct sk_buff *skb)
audit_hold_skb(skb);
}
+static char *audit_strerror(int err)
+{
+ switch (err) {
+ case -ECONNREFUSED:
+ return "ECONNREFUSED";
+ case -EPERM:
+ return "EPERM";
+ case -ENOMEM:
+ return "ENOMEM";
+ case -EAGAIN:
+ return "EAGAIN";
+ case -ERESTARTSYS:
+ return "ERESTARTSYS";
+ case -EINTR:
+ return "EINTR";
+ default:
+ return "(other)";
+ }
+}
See comments below.
static void kauditd_send_skb(struct sk_buff *skb)
{
int err;
+ int attempts = 0;
+#define AUDITD_RETRIES 5
+
+restart:
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
Should the restart label go after the skb_get() call? It seems like if we
ever jump to restart we could end up needlessly bumping the skb's refcnt.
err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
if (err < 0) {
BUG_ON(err != -ECONNREFUSED); /* Shouldn't happen */
+ pr_err("netlink_unicast sending to audit_pid=%d returned error: %d,
%s\n"
+ , audit_pid, err, audit_strerror(err));
This is a style nit, but please put the comma on the preceding line. I know
why you did it, but it bothers me.
I'm also debating if audit_strerror() is worth it. I agree with you that it
is a good idea to indicate the specific error code, I'm just not sure we need
to bother translating that into a proper errno name. Can you think of some
reason why we would need the errno name as opposed to the integer?
if (audit_pid) {
- pr_err("*NO* daemon at audit_pid=%d\n", audit_pid);
- audit_log_lost("auditd disappeared");
- audit_pid = 0;
- audit_sock = NULL;
+ if (err == -ECONNREFUSED || err == -EPERM
+ || ++attempts >= AUDITD_RETRIES) {
+ audit_log_lost("audit_pid=%d reset");
+ audit_pid = 0;
+ audit_sock = NULL;
+ } else {
+ pr_warn("re-scheduling(#%d) write to audit_pid=%d\n"
+ , attempts, audit_pid);
Same thing with the comma.
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule();
+ __set_current_state(TASK_RUNNING);
+ goto restart;
See my comment above about the skb's reference count.
+ }
}
/* we might get lucky and get this in the next auditd */
audit_hold_skb(skb);
--
paul moore
www.paul-moore.com