Hello,
On Tuesday, November 05, 2013 10:07:08 PM Burn Alting wrote:
I did a little experimentation today.
On a system that generates around 7500 audit events every five minutes I
changed, without success, the following:
In auditd.conf
- changed num_logs from 9 to 5 although I didn't expect a change as I
move out the rolled over (audit.log.?) log files as part of the
processing so there shouldn't be a big file rename impost
This should have helped a little since you dropped 4 syscalls.
- changed priority_boost from 4 to 8
In audit.rules
- changed backlog from 32K to 64K to 96K to 128K
This should only help to the extent of your constant fill rate. What happens is
your events are coming in and auditd is unable to attend to them during the
rotation because it has to start with audit.log.9 and delete it, then move all
logs up one number leaving no audit.log. At that point it can open a new one.
So, the backlog needs to be big enough to handle the overflow during that brief
time.
I would expect rotation takes 10 milliseconds at the most. But just for the
sake of argument, let's say it took 1 whole second. At your fill rate, you
should be receiving 25 events. Some of these events may be compound, meaning
they have support records besides syscall such as PATH or CWD. Let's assume
you have 4 supporting records per event. You now have 100 incoming events
during that one second. It would sound like setting the backlog to 32k should
be sufficient...unless the system is about to fallover anyways.
You might try running:
while true; do auditctl -s; sleep 5; done
and see if your system is never able to catch up. If that's the case, you need
to do something about the audit daemon's priority or scheduling. You can boost
the priority way up. 20. You might even add the 'chrt' command to the
initscript to see if you can put auditd on a different scheduler.
- changed rules to reduce the recorded events per 5 minute interval
from
7500 to 500-600 for the same period.
That should help both the backlog before rotation as well as the fill rate
during rotation.
This particular system is running audit-1.8.2-el5 but I see a
similar
problem on a RHEL 6.4 box which I believe is running audit-2.2-2.el6.
I think there was one change to normal processing that saved a syscall to stat
the disk and just do arithmetic instead. I don't know if that one patch would
help or not. It would allow auditd to keep the backlog lower prior to
rotation.
I did note that if I executed the sync(1) command before signaling
auditd to roll over (ie execute /bin/kill -s USR1 pid) the error
SOMETIMES did not appear.
So I am a little bit lost.
You might also experiment with the disk flushing in auditd.conf.
I believe that the actual effect is just
- the cost of two additional lines in /var/log/messages
- the loss a few logs
My actual process is to
a. roll over the log file
b. run an ausearch --interpret like command
Running the command shouldn't interfere.
Perhaps my alternative is to modify my ausearch-like command to be
state
full and have it process only new events as per a patch I made to
ausearch some time back
Subject: [PATCH] ausearch: Add checkpoint capability and have
incomplete logs carry forward when processing multiple audit.log
files
Date: 05/11/2013 03:59:34 PM
Am open to any suggestions ... I think the key issue is that I reduced
the generated commands into audit.log from 7500 to 600 per five minute
interval but I still see the error.
I think its several things. Dropping the fill rate will help. But something
else is going on. Maybe some of these hints can help you investigate the
problem.
-Steve
> On Monday, November 04, 2013 07:46:18 PM Burn Alting wrote:
> > Hi,
> >
> > I have some quite busy hosts, that emit the following errors when I
> > request the audit log file is rolled over (via a kill -s USR1
> > auditdpid).
> >
> > Error receiving audit netlink packet(No buffer space available)
> > Error sending signal_info request (No buffer space available)
> > >
> > >From reading earlier posts (circa 2009) it would appear my options are
> >
> > a. Increase backlog buffer (currently 32768)
> > b. Increase priority_boost (currently 4)
> > c. Reduce the number of log files (currently 9)
>
> Another corollary to this is that you can increase the file size and
> decrease the total files which would help on rotation.
>
> > Does anyone have a feel for which of the above should offer the best
> > return?
>
> There are 2 more options:
>
> 1) Review the rules to make sure you are not getting events that you
> really do not need. If you have a lot of false positives, then you might
> add some arguments that better narrow the results. For example, perhaps
> you have this rule:
>
> -a always,exit -F arch=b64 -S clock_settime -k time-change
>
> This can give a lot of false positives. The one that really matters is
> when a program sets CLOCK_REALTIME (the wall clock). So, the rule can be
> re-written as:
>
> -a always,exit -F arch=b64 -S clock_settime -F a0=0 -k time-change
>
> which narrows its scope.
>
> 2) You might experiment with cgroups.
>
> > Are their other configuration parameters I could adjust (aside from
> > changing my ruleset in audit.rules)?
>
> There might be general disk tuning parameters in sysctl that could help as
> well. Choice of file system also has performance impacts. I haven't done
> any experimenting on the performance side, but I know there are people
> here that also have very busy systems.
>
> -Steve