On Friday, May 19, 2017 4:22:24 PM EDT Klaus Lichtenwalder wrote:
(note to moderator: i sent this before from the wrong address, hope
it
doesn't get duplicated)
Hi,
we have a few SAP systems on RHEV (so virtualized on KVM) with >= 74
CPUs and >= 400G RAM.
When the system is busy with large SAP jobs, it goes onto its knees with
cpu %system up to 80%, thus making the SAP jobs run twice as long. As
soon as you stop auditd everything returns to normal...
Facts:
RHEL6 instances on RHEL7 hosts.
the rule set (see below) runs fine on any other system with less cpus
(<64, maybe this is the cut off?). We have smaller systems with this
rule set that rotate the audit file nearly every minute without any
noticable performance hit, these SAP systems rotate once every
20-24hours....
Anyone has an idea?
Here's an excerpt from "perf top":
with auditd running:
Samples: 28M of event 'cpu-clock', Event count (approx.): 236747914918
Overhead Shared Object Symbol
23.13% [kernel] [k] get_task_cred
10.05% [kernel] [k] audit_filter_rules
4.21% [kernel] [k] _spin_unlock_irqrestore
3.30% libdb2e.so.1 [.] sqlbfix
2.92% [kernel] [k] finish_task_switch
1.69% disp+work [.] rrol_in
1.69% disp+work [.] rrol_out
0.98% [kernel] [k] run_timer_softirq
0.96% [kernel] [k] rcu_process_gp_end
auditd stopped:
Samples: 3M of event 'cpu-clock', Event count (approx.): 526535382557
Overhead Shared Object Symbol
2.41% disp+work [.] memcmpU16
2.32% disp+work [.] MmxMalloc2
2.25% disp+work [.] ab_Rudi
2.07% disp+work [.] rrol_out
1.98% disp+work [.] rrol_in
1.95% disp+work [.] ab_CompByCmpCntx
1.88% libdb2e.so.1 [.] sqlbfix
1.73% disp+work [.] MmxFree2
1.62% [kernel] [k] run_timer_softirq
1.56% [kernel] [k] __do_softirq
1.39% disp+work [.] ab_InitRcDecompress
These are the audit rules:
auditctl -l
-a always,exit -S all -F path=/etc/environment -F perm=wa -F auid>=400 -F
key=CRIT_CONF
Clipped all the other rules. Out of curiosity, why do you include -S all in
every rule? That will automatically send the syscall into the syscall rules
which affects the performance of every single syscall in every single
application. The majority of your rules are file watches which generally takes
a different route that is more efficient.
To fix this, just remove "-S all" in every rule. I bet it works much better
after that.
-Steve