On Tue, May 25, 2021 at 9:11 PM Jens Axboe <axboe(a)kernel.dk> wrote:
On 5/24/21 1:59 PM, Paul Moore wrote:
> That said, audit is not for everyone, and we have build time and
> runtime options to help make life easier. Beyond simply disabling
> audit at compile time a number of Linux distributions effectively
> shortcut audit at runtime by adding a "never" rule to the audit
> filter, for example:
>
> % auditctl -a task,never
As has been brought up, the issue we're facing is that distros have
CONFIG_AUDIT=y and hence the above is the best real world case outside
of people doing custom kernels. My question would then be how much
overhead the above will add, considering it's an entry/exit call per op.
If auditctl is turned off, what is the expectation in turns of overhead?
I commented on that case in my last email to Pavel, but I'll try to go
over it again in a little more detail.
As we discussed earlier in this thread, we can skip the req->opcode
check before both the _entry and _exit calls, so we are left with just
the bare audit calls in the io_uring code. As the _entry and _exit
functions are small, I've copied them and their supporting functions
below and I'll try to explain what would happen in CONFIG_AUDIT=y,
"task,never" case.
+ static inline struct audit_context *audit_context(void)
+ {
+ return current->audit_context;
+ }
+ static inline bool audit_dummy_context(void)
+ {
+ void *p = audit_context();
+ return !p || *(int *)p;
+ }
+ static inline void audit_uring_entry(u8 op)
+ {
+ if (unlikely(audit_enabled && audit_context()))
+ __audit_uring_entry(op);
+ }
We have one if statement where the conditional checks on two
individual conditions. The first (audit_enabled) is simply a check to
see if anyone has "turned on" auditing at runtime; historically this
worked rather well, and still does in a number of places, but ever
since systemd has taken to forcing audit on regardless of the admin's
audit configuration it is less useful. The second (audit_context())
is a check to see if an audit_context has been allocated for the
current task. In the case of "task,never" current->audit_context will
be NULL (see audit_alloc()) and the __audit_uring_entry() slowpath
will never be called.
Worst case here is checking the value of audit_enabled and
current->audit_context. Depending on which you think is more likely
we can change the order of the check so that the
current->audit_context check is first if you feel that is more likely
to be NULL than audit_enabled is to be false (it may be that way now).
+ static inline void audit_uring_exit(int success, long code)
+ {
+ if (unlikely(!audit_dummy_context()))
+ __audit_uring_exit(success, code);
+ }
The exit call is very similar to the entry call, but in the
"task,never" case it is very simple as the first check to be performed
is the current->audit_context check which we know to be NULL. The
__audit_uring_exit() slowpath will never be called.
aio never had any audit logging as far as I can tell. I think
it'd make
a lot more sense to selectively enable audit logging only for opcodes
that we care about. File open/create/unlink/mkdir etc, that kind of
thing. File level operations that people would care about logging. Would
they care about logging a buffer registration or a polled read from a
device/file? I highly doubt it, and we don't do that for alternative
methods either. Doesn't really make sense for a lot of the other
operations, imho.
We would need to check with the current security requirements (there
are distro people on the linux-audit list that keep track of that
stuff), but looking at the opcodes right now my gut feeling is that
most of the opcodes would be considered "security relevant" so
selective auditing might not be that useful in practice. It would
definitely clutter the code and increase the chances that new opcodes
would not be properly audited when they are merged.
--
paul moore
www.paul-moore.com