On Thursday, July 16, 2015 08:38:22 AM Kangkook Jee wrote:
I'm writing a custom user-land auditd client subscribing to
kauditd to
monitor a number of system calls that we are interested. My auditd client
seems to work fine in overall but I found unexpected behavior of auditd
framework which slows down (or sometimes freezes) the entire system as the
consuming rate of audit client couldn't catch up the speed of audit message
generation.
This is by design. Auditing is so important that we cannot let even 1 event
escape the audit trail. To people that count on auditing, they would normally
rather have access denied than lose the ability to track who's accessing
something.
This leads to a couple issues. One is have you done anything about priority?
Did you give your daemon a healthy boost over the other processes so it gets
more runtime than normal processes? How about cgroups? Have you checked disk
synchronization techniques (some yield worse performance but guarantee its
written)? What about gprof traces to see where the "hotspots" are in your
daemon?
Here's the simple code snippet used to reproduce the problem.
//
// To build.
// g++ -o simple_audit -std=c++11 -L/usr/lib/x86_64-linux-gnu/ main.cpp
-laudit //
#include <libaudit.h>
#include <sys/types.h>
#include <unistd.h>
#include <cassert>
#include <iostream>
static int32_t fd = -1;
static bool au_listen_flag = true;
int main(int argc, char* argv[]) {
struct audit_reply rep;
uint64_t cnt = 0;
if (argc != 2) {
fprintf(stderr, "Invalid usage: %s <sleep_interval>\n",
argv[0]);
exit(1);
}
uint32_t sleep_time = atoi(argv[1]);
fd = audit_open();
if (fd < 0) {
// error handling.
std::cerr << "Invalid fd returned: " + std::to_string(fd)
<<
std::endl; exit(-1);
}
int32_t ret = audit_set_pid(fd, getpid(), WAIT_YES);
if (ret < 0) {
std::cerr << "audit_set_pid failed: " + std::to_string(fd)
<<
std::endl; exit(-1);
}
while (au_listen_flag) {
int32_t rc = audit_get_reply(fd, &rep, GET_REPLY_BLOCKING, 0);
if (rc > 0) {
cnt++;
}
usleep(sleep_time);
Why would you do this? You ought to be using epoll or something like that to
wait on next event.
if (cnt % 10000 == 0) {
printf ("messages %lu\n", cnt);
}
}
close(fd);
}
The problem becomes more apparent as we increase the amount of sleep time
that is provided as a first command line argument (say a thousand
Milli-seconds) and simultaneously run some heavy-load tasks (i.e., kernel
build).
sudo ./simple_audit 1000
Here's the command line that we used to add system calls to be monitored and
enable.
# Adding events.
/sbin/auditctl -a exit,always -F arch=b64 -S clone -S close -S creat -S dup
-S dup2 -S dup3 -S execve -S exit -S exit_group -S fork -S open -S openat
-S unlink -S unlinkat -S vfork -S 288 -S accept -S bind -S connect -S
listen -S socket -S socketpair
Next question...why would you want all those syscalls? Do you want them for
daemons and users? Normally daemons are considered normal system function and
are not of interest. What is of interest is what users do. So, to weed out
damons, you don't want anything with auid=-1. Because the kernel uses unsigned
numbers, you would add
-F auid>=1000 -F auid!=-1
to the rule. That might make a big difference.
# Enabling events.
/sbin/auditctl -e1 -b 102400
At the very moment, "auditctl -s" indicating that kernel buffer is filled up
but it does not throw away audit messages ('lost' is not increasing ).
# auditctl -s
AUDIT_STATUS: enabled=1 flag=1 pid=29887 rate_limit=0 backlog_limit=102400
lost=270878600 backlog=102402 # auditctl -s
AUDIT_STATUS: enabled=1 flag=1 pid=29887 rate_limit=0 backlog_limit=102400
lost=270878600 backlog=102402
Could anyone guide me how to configure kauditd's buffer setting so that it
can dump audit messages when the buffer is filled up and user-land consumer
can't catch up the speed of audit message produce?
If you don't mind losing events, you can also listen on the netlink socket
without setting the pid the same way that journald does it. That is a best
effort connection and not guaranteed to be lossless.
-Steve