On Thursday, October 06, 2011 04:27:03 PM larry.erdahl(a)usbank.com wrote:
I have a 5.4 Redhat that I'm using Snare to control the audit
rules with.
Recently this server hung on me and pointed to the SnareDispatcher as the
cause. You can see from the samples below the dispatcher was running at 99
- 100%.
The morning of the hang Auditd peaked at ~200,000 event's/hour, up from
~50,000 events per hour. Is there away to protect the server from hanging
during unexpected loads like this?
I'm assuming from what I've read, I'll need to increase the audit_backlog
level to something higher. Before increasing the number of buffers I'd
like to get a clearer understanding of their size and how increasing
these buffers my impact my over all system performance. Are there any
recommendations on what the settings should be or a formula that I could
use to determine the proper setting.
What the kernel sends to user space is a data structure like this:
#define MAX_AUDIT_MESSAGE_LENGTH 8970 // PATH_MAX*2+CONTEXT_SIZE*2+11+256+1
struct audit_message {
struct nlmsghdr nlh;
char data[MAX_AUDIT_MESSAGE_LENGTH];
};
This is in a skb, so there is probably some more memory used for skb bookkeeping. You
might just round that off to 9000 bytes and be close enough for practical purposes.
Increasing the backlog limit means that the kernel allocates this memory and its no
longer available for user space. With the size of memory in current hardware, I don't
think you have to worry too much as long as the setting is sane. A backlog length of
8192 means it occupies a little over 70 Mb of memory. But if you need to do this, you
need to do this.
I am looking into what may of caused the spike, but I'd like to
know what
my options to keep from having another system hang
Do you use keys for your audit rules? If so, run the key report to get an idea of what
was happening. From that you can zero in on what it was. You may also have a rule that
is too aggressive in logging. For example, perhaps you record file deletions in /usr/*
and then a yum update comes a long....overwriting and deleting thousands of files in a
few seconds.
Any help would be appreciated
Another possibility is increasing the audit daemon's priority a little and make sure
its disk performance is tuned.
Sep 30 01:29:16 <servername> kernel: audit: audit_backlog=321
>
audit_backlog_limit=320
This is the default setting. Its a bit low for production use. I'd bump that up a lot.
Make it at least 4096 if not 8192.
-Steve