On Tue, Nov 23, 2021 at 2:50 AM Zixuan Zhao
<zhaozixuan2(a)huawei.com> wrote:
> We used lat_syscall of lmbench3 to test the performance impact of this
> patch. We changed the number of rules and run lat_syscall with 1000
> repetitions at each test. Syscalls measured by lat_syscall are not
> monitored by rules.
>
> Before this optimization:
>
> null read write stat fstat open
> 0 rules 1.87ms 2.74ms 2.56ms 26.31ms 4.13ms 69.66ms
> 10 rules 2.15ms 3.13ms 3.32ms 26.99ms 4.16ms 74.70ms
> 20 rules 2.45ms 3.97ms 3.82ms 27.05ms 4.60ms 76.35ms
> 30 rules 2.64ms 4.52ms 3.95ms 30.30ms 4.94ms 78.94ms
> 40 rules 2.83ms 4.97ms 4.23ms 32.16ms 5.40ms 81.88ms
> 50 rules 3.00ms 5.30ms 4.84ms 33.49ms 5.79ms 83.20ms
> 100 rules 4.24ms 9.75ms 7.42ms 37.68ms 6.55ms 93.70ms
> 160 rules 5.50ms 16.89ms 12.18ms 51.53ms 17.45ms 155.40ms
>
> After this optimization:
>
> null read write stat fstat open
> 0 rules 1.81ms 2.84ms 2.42ms 27.70ms 4.15ms 69.10ms
> 10 rules 1.97ms 2.83ms 2.69ms 27.70ms 4.15ms 69.30ms
> 20 rules 1.72ms 2.91ms 2.41ms 26.49ms 3.91ms 71.19ms
> 30 rules 1.85ms 2.94ms 2.48ms 26.27ms 3.97ms 71.43ms
> 40 rules 1.88ms 2.94ms 2.78ms 26.85ms 4.08ms 69.79ms
> 50 rules 1.86ms 3.17ms 3.08ms 26.25ms 4.03ms 72.32ms
> 100 rules 1.84ms 3.00ms 2.81ms 26.25ms 3.98ms 70.25ms
> 160 rules 1.92ms 3.32ms 3.06ms 26.81ms 4.57ms 71.41ms
>
> As the result shown above, the syscall latencies increase as the
> number of rules increases, while with the patch the latencies remain stable.
> This could help when a user adds many audit rules for purposes (such
> as attack tracing or process behavior recording) but suffers from low
> performance.
I have general concerns about trading memory and complexity for performance gains, but
beyond that the numbers you posted above don't yet make sense to me.
Thanks for your reply.
The memory cost of this patch is less than 4KB (1820 bytes on x64 and
3640 bytes on compatible x86_64) which is trivial in many cases.
Besides, syscalls are called frequently on a system so a small
optimization could bring a good income.
Why are the latency increases due to rule count not similar across the
different syscalls? For example, I would think that if the increase in syscall latency was
>directly attributed to the audit rule processing then the increase on the
"open" syscall should be similar to that of the "null" syscall. In
other phrasing, if we >can process 160 rules in ~4ms in the "null" case, why
does it take us ~86ms in the "open" case?
As to the test result, we did some investigations and concluded two
reasons:
1. The chosen rule sets were not very suitable. Though they were not hit
by syscalls being measured, some of them were hit by other processes,
which reduced the system performance and affected the test result;
2. The routine of lat_syscall is much more complicated than we thought. It
called many other syscalls during the test, which may cause the result
not to be linear.
Due to the reasons above, we did another test. We modified audit rule sets
and made sure they wouldn't be hit at runtime. Then, we added
ktime_get_real_ts64 to auditsc.c to record the time of executing
__audit_syscall_exit. We ran "stat" syscall 10000 times for each rule set
and recorded the time interval. The result is shown below:
Before this optimization:
rule set time
0 rules 3843.96ns
1 rules 13119.08ns
10 rules 14003.13ns
20 rules 15420.18ns
30 rules 17284.84ns
40 rules 19010.67ns
50 rules 21112.63ns
100 rules 25815.02ns
130 rules 29447.09ns
After this optimization:
rule set time
0 rules 3597.78ns
1 rules 13498.73ns
10 rules 13122.57ns
20 rules 12874.88ns
30 rules 14351.99ns
40 rules 14181.07ns
50 rules 13806.45ns
100 rules 13890.85ns
130 rules 14441.45ns
As the result showed, the interval is linearly increased before
optimization while the interval remains stable after optimization. Note
that audit skips some operations if there are no rules, so there is a gap
between 0 rule and 1 rule set.