On Tue, Feb 4, 2014 at 11:11 AM, Oleg Nesterov <oleg(a)redhat.com> wrote:
On 02/04, Andy Lutomirski wrote:
> On Tue, Feb 4, 2014 at 8:50 AM, Oleg Nesterov <oleg(a)redhat.com> wrote:
> > On 02/03, Andy Lutomirski wrote:
> Sorry, forgot to mention: where is this mythical
> for_each_process_thread?
In Linus's tree, please see 0c740d0afc3bff.
> or you
> just hate do_each_thread so much that you imagined up an alternative
> :)
sort of ;)
Aha -- it probably got merged just after I pulled Linus' tree and
looked for it. Or something.
> I think I'll wait for Eric to chime in. I suspect that the real
> solution is to simplify all this stuff by relying on the fact that the
> syscall nr and args are saved by the (fast path and slow path) entry
> code, so the audit entry hook may be entirely unnecessary.
Perhaps... but even in this case we need to do something with, say,
__audit_log_bprm_fcaps().
At least this list should not grow indefinitely if the task skips
__audit_syscall_exit(). Although at first glance this can be probably
cleanuped too.
OK, here's a thought: let's change the semantics of TIF_SYSCALL_AUDIT.
New semantics:
TIF_SYSCALL_AUDIT is set if (the task is eligible for syscall auditing
and n_rules != 0 *or* something has started writing an audit record).
TIF_SYSCALL_AUDIT is *never* cleared by anything other than
copy_process or __audit_syscall_exit.
This means that, if there's a pending audit record, then
TIF_SYSCALL_AUDIT will be set. That, in turn, means that
__audit_syscall_exit will be called, which avoids the BUGs you pointed
out.
Now we get rid of __audit_syscall_entry. (This speeds up even the
auditing-is-on case.) Instead we have __audit_start_record, which
does more or less the same thing, except that (a) it doesn't BUG if
in_syscall and (b) it *sets* TIF_SYSCALL_AUDIT. This relies on the
fact that syscall_get_nr and syscall_get_arguments are reliable on
x86_64. I suspect that they're reliable everywhere else, too. The
idea is that there's nothing wrong with calling __audit_start_record
more than once. (Maybe it should be called
__audit_record_this_syscall.)
To finish the job, we change __audit_syscall_exit to clear
TIF_SYSCALL_AUDIT if n_rules==0. inc_n_rules can set
TIF_SYSCALL_AUDIT (without even needing to worry about races, I
think). dec_n_rules doesn't need to do anything special.
Benefits:
- Removing the syscall entry hook speeds everyone up. Even silly
people who use exit rules :)
- There's no need to think about mismatched entry/exit hook calls,
since there is no entry hook.
- The same mechanism could be reused for non-audit purposes. Any
code could say, at any point "hey, this syscall is interesting. let's
record it."
Disadvantages:
- Need to check other architectures to make sure that syscall_get_xyz
works reliably for fast path syscalls.
- For the full performance boost, all architectures need to avoid
checking TIF_SYSCALL_AUDIT in the entry path. I prefer not to mess
with non-x86 assembly, so I won't do that part, since it's not
required for correctness.
Eric, any thoughts?
--Andy