Linux-audit November 2023

linux-audit@lists.linux-audit.osci.io

3 participants
2 discussions

by Chris Riches

We are experiencing strange failures where the audit daemon fails to start on boot, hitting an ENOBUFS error on its audit_set_pid() call. This can be reproduced by repeatedly restarting the audit daemon while the system is under heavy audit load. This also seems to be dependent on the number of CPUs - we can reproduce this with 2 CPUs but not with 48. Tracing showed a race between the kernel enabling audit messages to be sent to the daemon and actually sending the ACK, wherein the socket buffer could get filled by audit messages before the ACK could be sent, leading to the ACK being dropped and ENOBUFS set on the socket by netlink code. A patch to mitigate this race from the kernel side is separately under discussion on the audit subsystem mailing list: https://lore.kernel.org/audit/20230922152749.244197-1-chris.riches@nutani... It's worth noting that this is almost certainly the same issue observed in this thread from last month (participants CCed): https://listman.redhat.com/archives/linux-audit/2023-September/020087.html Here, I am hoping to discuss ACK handling from the userspace side. The current implementation is a little odd - check_ack() will happily return success without seeing an ACK if a non-ACK message is top of the socket queue, but will fail if no message arrives within the timeout. It also of course fails if ENOBUFS is set on the socket, but this failure only seems to matter when doing audit_set_pid() - similar failures during main-loop message processing are logged but otherwise ignored, as far as I can tell. I'm not sure I quite understand the intentions of the code, but it seems odd to let ENOBUFS be a fatal error here, given that it likely means the socket buffer got flooded with audit messages, and thus audit_set_pid() succeeded. Perhaps we should just ignore ENOBUFS or even set NETLINK_NO_ENOBUFS? It may also be worth increasing the netlink socket buffer size, though this could only make the issue less likely and would not be sufficient under arbitrarily heavy audit loads. Finally, there is another oddity in audit_set_pid() that is tangential to this discussion but worth highlighting: if the wmode parameter is WAIT_YES, then there is some additional ACK-handling which waits for 100 milliseconds and eats the top message of the socket queue if one arrives, without inspecting it. This seems completely wrong as the ACK will have already been consumed by check_ack() if there was one, and so the best this code can do is nothing, and at worst (quite likely) it will swallow a genuine audit message without ever recording it. - Chris

2 years, 4 months

2
7
0 / 0

Welcome to the new linux-audit mail list

by Steve Grubb

Hello Everyone, We're still alive. :-) I was able to work out a migration from redhat.com to osci.io which is a home for Open Source Community Infrastructure. Maybe that's a better fit. All member addresses were transferred and resubscribed. I have updated the mail link on people.redhat.com/sgrubb/audit to point to the new signup page. Everyone should take a moment to update their address book to have linux-audit(a)lists.linux-audit.osci.io as the new list address. The old archives have been migrated to https://lists.linux-audit.osci.io/archives/ which is run by hyperkitty if you are familiar with the fedora archives...it's the same. Hopefully our new home suits everyone and we can get back to doing the 4.0 release. Thanks, -Steve

2 years, 4 months

1
0
0 / 0

← Newer
1
Older →

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Linux-audit November 2023