Paul Moore <paul(a)paul-moore.com> writes:
On Tue, Sep 27, 2022 at 7:00 PM Ankur Arora
<ankur.a.arora(a)oracle.com> wrote:
>
> With sane audit rules, audit logging would only be triggered
> infrequently. Keeping this in mind, annotate audit_in_mask() as
> unlikely() to allow the compiler to pessimize the call to
> audit_filter_rules().
>
> This allows GCC to invert the branch direction for the audit_filter_rules()
> basic block in this loop:
>
> list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list)
{
> if (audit_in_mask(&e->rule, major) &&
> audit_filter_rules(tsk, &e->rule, ctx, NULL,
> &state, false)) {
> ...
>
> such that it executes the common case in a straight line fashion.
>
> On a Skylakex system change in getpid() latency (all results
> aggregated across 12 boot cycles):
>
> Min Mean Median Max pstdev
> (ns) (ns) (ns) (ns)
>
> - 196.63 207.86 206.60 230.98 (+- 3.92%)
> + 173.11 182.51 179.65 202.09 (+- 4.34%)
>
> Performance counter stats for 'bin/getpid' (3 runs) go from:
> cycles 805.58 ( +- 4.11% )
> instructions 1654.11 ( +- .05% )
> IPC 2.06 ( +- 3.39% )
> branches 430.02 ( +- .05% )
> branch-misses 1.55 ( +- 7.09% )
> L1-dcache-loads 440.01 ( +- .09% )
> L1-dcache-load-misses 9.05 ( +- 74.03% )
>
> to:
> cycles 706.13 ( +- 4.13% )
> instructions 1654.70 ( +- .06% )
> IPC 2.35 ( +- 4.25% )
> branches 430.99 ( +- .06% )
> branch-misses 0.50 ( +- 2.00% )
> L1-dcache-loads 440.02 ( +- .07% )
> L1-dcache-load-misses 5.22 ( +- 82.75% )
>
> (Both aggregated over 12 boot cycles.)
>
> cycles: performance improves on average by ~100 cycles/call. IPC
> improves commensurately. Two reasons for this improvement:
>
> * one fewer branch mispred: no obvious reason for this
> branch-miss reduction. There is no significant change in
> basic-block structure (apart from the branch inversion.)
>
> * the direction of the branch for the call is now inverted, so it
> chooses the not-taken direction more often. The issue-latency
> for not-taken branches is often cheaper.
>
> Signed-off-by: Ankur Arora <ankur.a.arora(a)oracle.com>
> ---
> kernel/auditsc.c | 15 ++++++++-------
> 1 file changed, 8 insertions(+), 7 deletions(-)
I generally dislike merging likely()/unlikely() additions to code
paths that can have varying levels of performance depending on runtime
configuration.
I think that's fair, and in this particular case the benchmark is quite
contrived.
But, just to elaborate a bit more on why that unlikely() clause made
sense to me: it seems to me that audit typically would be triggered for
control syscalls and the ratio between control and non-control ones
would be fairly lopsided.
Let me see if I can rewrite the conditional in a different way to get a
similar effect but I suspect that might be even more compiler dependent.
Also, let me run the audit-testsuite this time. Is there a good test
there that you would recommend that might serve as a more representative
workload?
Thanks
Ankur
While I appreciate the work you are doing to improve
audit performance, I don't think this is something I want to merge,
I'm sorry.
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index 533b087c3c02..bf26f47b5226 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -789,7 +789,7 @@ static enum audit_state audit_filter_task(struct task_struct
*tsk, char **key)
> return AUDIT_STATE_BUILD;
> }
>
> -static int audit_in_mask(const struct audit_krule *rule, unsigned long val)
> +static bool audit_in_mask(const struct audit_krule *rule, unsigned long val)
> {
> int word, bit;
>
> @@ -850,12 +850,13 @@ static void audit_filter_syscall(struct task_struct *tsk,
>
> rcu_read_lock();
> list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_EXIT], list)
{
> - if (audit_in_mask(&e->rule, major) &&
> - audit_filter_rules(tsk, &e->rule, ctx, NULL,
> - &state, false)) {
> - rcu_read_unlock();
> - ctx->current_state = state;
> - return;
> + if (unlikely(audit_in_mask(&e->rule, major))) {
> + if (audit_filter_rules(tsk, &e->rule, ctx, NULL,
> + &state, false)) {
> + rcu_read_unlock();
> + ctx->current_state = state;
> + return;
> + }
> }
> }
> rcu_read_unlock();
> --
> 2.31.1
--
ankur