On Wed, Dec 1, 2021 at 9:25 PM zhaozixuan (C) <zhaozixuan2(a)huawei.com> wrote:
> On Mon, Nov 29, 2021 at 2:35 AM zhaozixuan (C)
<zhaozixuan2(a)huawei.com> wrote:
> > >On Tue, Nov 23, 2021 at 2:50 AM Zixuan Zhao <zhaozixuan2(a)huawei.com>
wrote:
> > >> We used lat_syscall of lmbench3 to test the performance impact of
> > >> this patch. We changed the number of rules and run lat_syscall with
> > >> 1000 repetitions at each test. Syscalls measured by lat_syscall are
> > >> not monitored by rules.
> > >>
> > >> Before this optimization:
> > >>
> > >> null read write stat fstat open
> > >> 0 rules 1.87ms 2.74ms 2.56ms 26.31ms 4.13ms 69.66ms
> > >> 10 rules 2.15ms 3.13ms 3.32ms 26.99ms 4.16ms 74.70ms
> > >> 20 rules 2.45ms 3.97ms 3.82ms 27.05ms 4.60ms 76.35ms
> > >> 30 rules 2.64ms 4.52ms 3.95ms 30.30ms 4.94ms 78.94ms
> > >> 40 rules 2.83ms 4.97ms 4.23ms 32.16ms 5.40ms 81.88ms
> > >> 50 rules 3.00ms 5.30ms 4.84ms 33.49ms 5.79ms 83.20ms
> > >> 100 rules 4.24ms 9.75ms 7.42ms 37.68ms 6.55ms 93.70ms
> > >> 160 rules 5.50ms 16.89ms 12.18ms 51.53ms 17.45ms 155.40ms
> > >>
> > >> After this optimization:
> > >>
> > >> null read write stat fstat open
> > >> 0 rules 1.81ms 2.84ms 2.42ms 27.70ms 4.15ms 69.10ms
> > >> 10 rules 1.97ms 2.83ms 2.69ms 27.70ms 4.15ms 69.30ms
> > >> 20 rules 1.72ms 2.91ms 2.41ms 26.49ms 3.91ms 71.19ms
> > >> 30 rules 1.85ms 2.94ms 2.48ms 26.27ms 3.97ms 71.43ms
> > >> 40 rules 1.88ms 2.94ms 2.78ms 26.85ms 4.08ms 69.79ms
> > >> 50 rules 1.86ms 3.17ms 3.08ms 26.25ms 4.03ms 72.32ms
> > >> 100 rules 1.84ms 3.00ms 2.81ms 26.25ms 3.98ms 70.25ms
> > >> 160 rules 1.92ms 3.32ms 3.06ms 26.81ms 4.57ms 71.41ms
> > >>
> > >> As the result shown above, the syscall latencies increase as the
> > >> number of rules increases, while with the patch the latencies remain
stable.
> > >> This could help when a user adds many audit rules for purposes
> > >> (such as attack tracing or process behavior recording) but suffers
> > >> from low performance.
> > >
> > >I have general concerns about trading memory and complexity for performance
gains, but beyond that the numbers you posted above don't yet make sense to me.
> >
> > Thanks for your reply.
> >
> > The memory cost of this patch is less than 4KB (1820 bytes on x64 and
> > 3640 bytes on compatible x86_64) which is trivial in many cases.
> > Besides, syscalls are called frequently on a system so a small
> > optimization could bring a good income.
>
> The tradeoff still exists, even though you feel it is worthwhile.
>
> > >Why are the latency increases due to rule count not similar across the
different syscalls? For example, I would think that if the increase in syscall latency was
> >directly attributed to the audit rule processing then the increase on the
"open" syscall should be similar to that of the "null" syscall. In
other phrasing, if we > >can process 160 rules in ~4ms in the "null" case,
why does it take us ~86ms in the "open" case?
> >
> > As to the test result, we did some investigations and concluded two
> > reasons:
> > 1. The chosen rule sets were not very suitable. Though they were not
> > hit by syscalls being measured, some of them were hit by other
> > processes, which reduced the system performance and affected the test
> > result; 2. The routine of lat_syscall is much more complicated than we
> > thought. It called many other syscalls during the test, which may
> > cause the result not to be linear.
> >
> > Due to the reasons above, we did another test. We modified audit rule
> > sets and made sure they wouldn't be hit at runtime. Then, we added
> > ktime_get_real_ts64 to auditsc.c to record the time of executing
> > __audit_syscall_exit. We ran "stat" syscall 10000 times for each
rule
> > set and recorded the time interval. The result is shown below:
> >
> > Before this optimization:
> >
> > rule set time
> > 0 rules 3843.96ns
> > 1 rules 13119.08ns
> > 10 rules 14003.13ns
> > 20 rules 15420.18ns
> > 30 rules 17284.84ns
> > 40 rules 19010.67ns
> > 50 rules 21112.63ns
> > 100 rules 25815.02ns
> > 130 rules 29447.09ns
> >
> > After this optimization:
> >
> > rule set time
> > 0 rules 3597.78ns
> > 1 rules 13498.73ns
> > 10 rules 13122.57ns
> > 20 rules 12874.88ns
> > 30 rules 14351.99ns
> > 40 rules 14181.07ns
> > 50 rules 13806.45ns
> > 100 rules 13890.85ns
> > 130 rules 14441.45ns
> >
> > As the result showed, the interval is linearly increased before
> > optimization while the interval remains stable after optimization.
> > Note that audit skips some operations if there are no rules, so there
> > is a gap between 0 rule and 1 rule set.
>
> It looks like a single rule like the one below could effectively disable this
optimization, is that correct?
>
> % auditctl -a exit,always -F uid=1001
> % auditctl -l
> -a always,exit -S all -F uid=1001
Yes, rules like this one which monitors all syscalls could disable the
optimization. The number of the global array could exponentially increase
if we want to handle more audit fields. However, we don't that kind of
rule is practical because they might generate a great number of logs and
even lead to log loss.
Before we merge something like this I think we need a better
understand of typical audit filter rules used across the different
audit use cases. This patch is too much of a band-aid to merge
without a really good promise that it will help most of the real world
audit deployments.
--
paul moore
www.paul-moore.com