On Sun, 2021-12-05 at 21:49 -0500, Paul Moore wrote:

On Wed, Dec 1, 2021 at 9:25 PM zhaozixuan (C) <

zhaozixuan2@huawei.com

> wrote:

 On Mon, Nov 29, 2021 at 2:35 AM zhaozixuan (C) <

zhaozixuan2@huawei.com

> wrote:

On Tue, Nov 23, 2021 at 2:50 AM Zixuan Zhao <

zhaozixuan2@huawei.com

> wrote:

We used lat_syscall of lmbench3 to test the performance impact of

this patch. We changed the number of rules and run lat_syscall with

1000 repetitions at each test. Syscalls measured by lat_syscall are

not monitored by rules.

Before this optimization:

             null     read    write     stat    fstat      open

  0 rules  1.87ms   2.74ms   2.56ms   26.31ms  4.13ms   69.66ms

 10 rules  2.15ms   3.13ms   3.32ms   26.99ms  4.16ms   74.70ms

 20 rules  2.45ms   3.97ms   3.82ms   27.05ms  4.60ms   76.35ms

 30 rules  2.64ms   4.52ms   3.95ms   30.30ms  4.94ms   78.94ms

 40 rules  2.83ms   4.97ms   4.23ms   32.16ms  5.40ms   81.88ms

 50 rules  3.00ms   5.30ms   4.84ms   33.49ms  5.79ms   83.20ms

100 rules  4.24ms   9.75ms   7.42ms   37.68ms  6.55ms   93.70ms

160 rules  5.50ms   16.89ms  12.18ms  51.53ms  17.45ms  155.40ms

After this optimization:

             null     read    write     stat    fstat      open

  0 rules  1.81ms   2.84ms   2.42ms  27.70ms   4.15ms   69.10ms

 10 rules  1.97ms   2.83ms   2.69ms  27.70ms   4.15ms   69.30ms

 20 rules  1.72ms   2.91ms   2.41ms  26.49ms   3.91ms   71.19ms

 30 rules  1.85ms   2.94ms   2.48ms  26.27ms   3.97ms   71.43ms

 40 rules  1.88ms   2.94ms   2.78ms  26.85ms   4.08ms   69.79ms

 50 rules  1.86ms   3.17ms   3.08ms  26.25ms   4.03ms   72.32ms

100 rules  1.84ms   3.00ms   2.81ms  26.25ms   3.98ms   70.25ms

160 rules  1.92ms   3.32ms   3.06ms  26.81ms   4.57ms   71.41ms

As the result shown above, the syscall latencies increase as  the

number  of rules increases, while with the patch the latencies remain stable.

 This could help when a user adds many audit rules for purposes

(such as  attack tracing or process behavior recording) but suffers

from low performance.

I have general concerns about trading memory and complexity for performance gains, but beyond that the numbers you posted above don't yet make sense to me.

Thanks for your reply.

The memory cost of this patch is less than 4KB (1820 bytes on x64 and

 3640 bytes on compatible x86_64) which is trivial in many cases.

 Besides, syscalls are called frequently on a system so a small

optimization could bring a good income.

The tradeoff still exists, even though you feel it is worthwhile.

Why are the latency increases due to rule count not similar across the different syscalls? For example, I would think that if the increase in syscall latency was > >directly attributed to the audit rule processing then the increase on the "open" syscall should be similar to that of the "null" syscall.  In other phrasing, if we > >can process 160 rules in ~4ms in the "null" case, why does it take us ~86ms in the "open" case?

As to the test result, we did some investigations and concluded two

 reasons:

1. The chosen rule sets were not very suitable. Though they were not

hit  by syscalls being measured, some of them were hit by other

processes,  which reduced the system performance and affected the test

result; 2. The routine of lat_syscall is much more complicated than we

thought. It  called many other syscalls during the test, which may

cause the result  not to be linear.

Due to the reasons above, we did another test. We modified audit rule

sets  and made sure they wouldn't be hit at runtime. Then, we added

 ktime_get_real_ts64 to auditsc.c to record the time of executing

__audit_syscall_exit. We ran "stat" syscall 10000 times for each rule

set  and recorded the time interval. The result is shown below:

Before this optimization:

rule set          time

  0 rules     3843.96ns

  1 rules    13119.08ns

 10 rules    14003.13ns

 20 rules    15420.18ns

 30 rules    17284.84ns

 40 rules    19010.67ns

 50 rules    21112.63ns

100 rules    25815.02ns

130 rules    29447.09ns

After this optimization:

 rule set          time

  0 rules     3597.78ns

  1 rules    13498.73ns

 10 rules    13122.57ns

 20 rules    12874.88ns

 30 rules    14351.99ns

 40 rules    14181.07ns

 50 rules    13806.45ns

100 rules    13890.85ns

130 rules    14441.45ns

As the result showed, the interval is linearly increased before

optimization while the interval remains stable after optimization.

Note  that audit skips some operations if there are no rules, so there

is a gap  between 0 rule and 1 rule set.

It looks like a single rule like the one below could effectively disable this optimization, is that correct?

  % auditctl -a exit,always -F uid=1001

  % auditctl -l

  -a always,exit -S all -F uid=1001

Yes, rules like this one which monitors all syscalls could disable the

 optimization. The number of the global array could exponentially increase

 if we want to handle more audit fields. However, we don't that kind of

 rule is practical because they might generate a great number of logs and

 even lead to log loss.

Before we merge something like this I think we need a better

understand of typical audit filter rules used across the different

audit use cases.  This patch is too much of a band-aid to merge

without a really good promise that it will help most of the real world

audit deployments.

For a 'real world deployment, I suggest

cd /usr/share/audit/sample-rules

cp 10-base-config.rules 11-loginuid.rules 12-ignore-error.rules 30-stig.rules 41-containers.rules 43-module-load.rules 71-networking.rules /etc/audit/rules.d/

rm -f /etc/audit/rules.d/audit.rules # Remove default ruleset if not applicable

echo '-b 32768' > /etc/audit/rules.d/zzexecve.rules

echo '-a exit,always -F arch=b32 -F auid!=2147483647 -S execve -k cmds' >> /etc/audit/rules.d/zzexecve.rules

echo '-a exit,always -F arch=b64 -F auid!=4294967295 -S execve -k cmds' >> /etc/audit/rules.d/zzexecve.rules