Hi Lenny,
On Friday, November 12, 2010 12:24:43 pm LC Bruzenak wrote:
In our systems there are occasionally AVC "storms" which
happen as a
result of some unforeseen (and often unknown) issue tickled by various
reasons.
At fielded sites, we are unable to fix this easily. Since we have to
keep all the audit data, this leads to many problems on a system running
over a weekend, for example, with no administrators around.
I probably need to add in either some rate-limiting code or possibly
kill off the process generating the AVCs. Rate-limiting I'd guess could
go into the auditd. If I wanted to be more brutal and kill the process,
I'd think maybe a modification to the setroubleshoot code would be
workable.
I didn't answer right away because I didn't have a good answer for you. If the
storm
is large enough to overrun the kernel queue, the rate limiting needs to be in the
kernel. If auditd is able to handle the load, then perhaps you need an analysis plugin
that performs whatever action you deem best.
I don't think that a reactive rule is an option -
1) We have our rules locked into the kernel on startup and I'm against
changing that, and
2) maybe "normal" avc counts, under a threshold, we'd still want to see,
from that same process. Besides,
3) unless the rules have been changed, we cannot exclude AVCs from a
particular type/process anyway.
Got any thoughts/ideas/advice?
What is the general source of the problem right now? Was it just that the app was
doing something that policy didn't know it could do? Or was there attacks under way
that someone was trying something bad? Or was its just an admin mistake where
something didn't have the right label? Each of these has a different solution.
I think this is a complex problem and controls might be needed at several spots. I'd
be open to hearing ideas on this too. I've also been wondering if the audit daemon
might want to use control groups as a means of keeping itself scheduled for very busy
systems. But i'd like to hear other people's thoughts.
-Steve