On Sun, 2021-12-05 at 21:49 -0500, Paul Moore wrote:
On Wed, Dec 1, 2021 at 9:25 PM zhaozixuan (C) <
zhaozixuan2@huawei.com
> wrote:
 On Mon, Nov 29, 2021 at 2:35 AM zhaozixuan (C) <
zhaozixuan2@huawei.com
> wrote:
On Tue, Nov 23, 2021 at 2:50 AM Zixuan Zhao <
zhaozixuan2@huawei.com
> wrote:
We used lat_syscall of lmbench3 to test the performance impact of
this patch. We changed the number of rules and run lat_syscall with
1000 repetitions at each test. Syscalls measured by lat_syscall are
not monitored by rules.

Before this optimization:

             null     read    write     stat    fstat      open
  0 rules  1.87ms   2.74ms   2.56ms   26.31ms  4.13ms   69.66ms
 10 rules  2.15ms   3.13ms   3.32ms   26.99ms  4.16ms   74.70ms
 20 rules  2.45ms   3.97ms   3.82ms   27.05ms  4.60ms   76.35ms
 30 rules  2.64ms   4.52ms   3.95ms   30.30ms  4.94ms   78.94ms
 40 rules  2.83ms   4.97ms   4.23ms   32.16ms  5.40ms   81.88ms
 50 rules  3.00ms   5.30ms   4.84ms   33.49ms  5.79ms   83.20ms
100 rules  4.24ms   9.75ms   7.42ms   37.68ms  6.55ms   93.70ms
160 rules  5.50ms   16.89ms  12.18ms  51.53ms  17.45ms  155.40ms

After this optimization:

             null     read    write     stat    fstat      open
  0 rules  1.81ms   2.84ms   2.42ms  27.70ms   4.15ms   69.10ms
 10 rules  1.97ms   2.83ms   2.69ms  27.70ms   4.15ms   69.30ms
 20 rules  1.72ms   2.91ms   2.41ms  26.49ms   3.91ms   71.19ms
 30 rules  1.85ms   2.94ms   2.48ms  26.27ms   3.97ms   71.43ms
 40 rules  1.88ms   2.94ms   2.78ms  26.85ms   4.08ms   69.79ms
 50 rules  1.86ms   3.17ms   3.08ms  26.25ms   4.03ms   72.32ms
100 rules  1.84ms   3.00ms   2.81ms  26.25ms   3.98ms   70.25ms
160 rules  1.92ms   3.32ms   3.06ms  26.81ms   4.57ms   71.41ms

As the result shown above, the syscall latencies increase as  the
number  of rules increases, while with the patch the latencies remain stable.
 This could help when a user adds many audit rules for purposes
(such as  attack tracing or process behavior recording) but suffers
from low performance.

I have general concerns about trading memory and complexity for performance gains, but beyond that the numbers you posted above don't yet make sense to me.

Thanks for your reply.

The memory cost of this patch is less than 4KB (1820 bytes on x64 and
 3640 bytes on compatible x86_64) which is trivial in many cases.
 Besides, syscalls are called frequently on a system so a small
optimization could bring a good income.

The tradeoff still exists, even though you feel it is worthwhile.

Why are the latency increases due to rule count not similar across the different syscalls? For example, I would think that if the increase in syscall latency was > >directly attributed to the audit rule processing then the increase on the "open" syscall should be similar to that of the "null" syscall.  In other phrasing, if we > >can process 160 rules in ~4ms in the "null" case, why does it take us ~86ms in the "open" case?

As to the test result, we did some investigations and concluded two
 reasons:
1. The chosen rule sets were not very suitable. Though they were not
hit  by syscalls being measured, some of them were hit by other
processes,  which reduced the system performance and affected the test
result; 2. The routine of lat_syscall is much more complicated than we
thought. It  called many other syscalls during the test, which may
cause the result  not to be linear.

Due to the reasons above, we did another test. We modified audit rule
sets  and made sure they wouldn't be hit at runtime. Then, we added
 ktime_get_real_ts64 to auditsc.c to record the time of executing
__audit_syscall_exit. We ran "stat" syscall 10000 times for each rule
set  and recorded the time interval. The result is shown below:

Before this optimization:

rule set          time
  0 rules     3843.96ns
  1 rules    13119.08ns
 10 rules    14003.13ns
 20 rules    15420.18ns
 30 rules    17284.84ns
 40 rules    19010.67ns
 50 rules    21112.63ns
100 rules    25815.02ns
130 rules    29447.09ns

After this optimization:

 rule set          time
  0 rules     3597.78ns
  1 rules    13498.73ns
 10 rules    13122.57ns
 20 rules    12874.88ns
 30 rules    14351.99ns
 40 rules    14181.07ns
 50 rules    13806.45ns
100 rules    13890.85ns
130 rules    14441.45ns

As the result showed, the interval is linearly increased before
optimization while the interval remains stable after optimization.
Note  that audit skips some operations if there are no rules, so there
is a gap  between 0 rule and 1 rule set.

It looks like a single rule like the one below could effectively disable this optimization, is that correct?

  % auditctl -a exit,always -F uid=1001
  % auditctl -l
  -a always,exit -S all -F uid=1001

Yes, rules like this one which monitors all syscalls could disable the
 optimization. The number of the global array could exponentially increase
 if we want to handle more audit fields. However, we don't that kind of
 rule is practical because they might generate a great number of logs and
 even lead to log loss.

Before we merge something like this I think we need a better
understand of typical audit filter rules used across the different
audit use cases.  This patch is too much of a band-aid to merge
without a really good promise that it will help most of the real world
audit deployments.

For a 'real world deployment, I suggest
cd /usr/share/audit/sample-rules
cp 10-base-config.rules 11-loginuid.rules 12-ignore-error.rules 30-stig.rules 41-containers.rules 43-module-load.rules 71-networking.rules /etc/audit/rules.d/
rm -f /etc/audit/rules.d/audit.rules # Remove default ruleset if not applicable
echo '-b 32768' > /etc/audit/rules.d/zzexecve.rules
echo '-a exit,always -F arch=b32 -F auid!=2147483647 -S execve -k cmds' >> /etc/audit/rules.d/zzexecve.rules
echo '-a exit,always -F arch=b64 -F auid!=4294967295 -S execve -k cmds' >> /etc/audit/rules.d/zzexecve.rules