I need some suggestions for debugging an issue I'm having.
I have a Dell Vostro laptop I've been using successfully for a while
(details below). It has some user apps running but doesn't seem
overburdened. I am running mls policy in permissive mode.
However, recently the following happens:
PART 1 (prelude relay disabled):
* audit is enabled, there are 2 audisp plugins (prelude and af_unix).
* The audispd.conf q_depth = 128
* I go to our project source directory and start an "svn up"
* In another window as root I "tail -f /var/syslog/messages":
...
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:40 comms audispd: queue is full - dropping event
May 9 13:51:42 comms auditd[3629]: Audit daemon rotating log files with
keep option
May 9 13:52:10 comms prelude-manager: WARNING: Failover enabled:
connection error with 192.168.31.120:4690: Connection timed out
* Very soon after this the machine locks up. The above is the last entry
in the messages log. Only the "caps lock" and some other "lock" icon
on
the keyboard (but not scroll lock) flash, and I have no inbound network
connection & the screen is blank. I cannot get to a terminal with
<ALT><F4> . The only option is power cycle.
* After reboot, if I "service auditd stop" then repeat the svn stuff
there is no freeze, no messages. I suspect it is something with file
traversals and the audit dispatcher/prelude. It also happened once when
doing a "rm -rf " on a directory with many files under my home
directory.
* I purposely have a lot of audit logs left in the directory:
[root@hugo ~]# ls -1 /var/log/audit | wc -l
90
* I purposely have the prelude parent manager (relay-to machine)
disabled.
* The machine was not exceptionally busy in userland according to the
"top" I had running in another window. Here is the header from that (the
"top" process was running, all others sleeping):
top - 13:52:40 up 19 min, 3 users, load average: 0.14, 0.16, 0.12
Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.2%us, 0.3%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 2060944k total, 912324k used, 1148620k free, 210276k buffers
Swap: 6835000k total, 0k used, 6835000k free, 240760k cached
* The freeze-up happens faster (I believe) if I leave the audispd.conf
q_depth = 80 (default).
Details:
[root@hugo ~]# uname -a
Linux hugo 2.6.25-14.fc9.x86_64 #1 SMP Thu May 1 06:06:21 EDT 2008
x86_64 x86_64 x86_64 GNU/Linux
[root@hugo ~]# rpm -qa | grep audit-
audit-libs-1.7.2-6.fc9.i386
audit-1.7.2-6.fc9.x86_64
audit-libs-1.7.2-6.fc9.x86_64
...
* I have lots of audit rules, plan to add more:
[root@hugo ~]# auditctl -l | wc -l
84
* Disk is not full:
[root@hugo ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
108G 7.2G 101G 7% /
/dev/sda1 190M 20M 161M 11% /boot
tmpfs 1007M 48K 1007M 1% /dev/shm
PART 2:
So - then I enabled the relaying prelude-manager. The svn update got
farther, and I thought maybe that was the cause of the original problem.
However, I saw this first in the messages log:
...
May 9 15:31:36 comms audispd: queue is full - dropping event
May 9 15:31:36 comms audispd: queue is full - dropping event
May 9 15:31:36 comms audispd: queue is full - dropping event
May 9 15:31:36 comms audispd: queue is full - dropping event
May 9 15:31:38 comms audispd: queue is full - dropping event
May 9 15:31:38 comms audispd: queue is full - dropping event
May 9 15:31:38 comms audispd: queue is full - dropping event
May 9 15:31:38 comms audispd: queue is full - dropping event
May 9 15:31:38 comms auditd[3682]: Audit daemon rotating log files with
keep option
May 9 15:31:43 comms auditd[3682]: Audit daemon rotating log files with
keep option
May 9 15:31:48 comms auditd[3682]: Audit daemon rotating log files with
keep option
May 9 15:31:53 comms auditd[3682]: Audit daemon rotating log files with
keep option
Then the same freeze-up happens as described above.
Any suggestions or other data I can provide to help debug?
In the meantime I will increase the audispd.conf q_depth and retest.
Thx,
LCB.
--
LC (Lenny) Bruzenak
lenny(a)magitekltd.com
Show replies by date
On Friday 09 May 2008 15:58:45 LC Bruzenak wrote:
I need some suggestions for debugging an issue I'm having.
I have a Dell Vostro laptop I've been using successfully for a while
(details below). It has some user apps running but doesn't seem
overburdened. I am running mls policy in permissive mode.
I don't know what is causing the lockups, but the overflowing queue is a
priority problem. I've found that I need at least -8 priority boost for the
audispd daemon to stay ahead of auditd.
I just released a new audit package that allows setting separate auditd and
audispd priorities. In the meantime, you can "renice -8 -p <pid>" for
audispd
and see if that doesn't fix the main problem.
-Steve
On Fri, 2008-05-09 at 14:58 -0500, LC Bruzenak wrote:
* Very soon after this the machine locks up. The above is the last
entry
in the messages log. Only the "caps lock" and some other "lock" icon
on
the keyboard (but not scroll lock) flash, and I have no inbound network
connection & the screen is blank. I cannot get to a terminal with
<ALT><F4> . The only option is power cycle.
This is indicative of a kernel panic. Can you attach a serial console
to the machine in question and boot such that kernel messages go to the
serial console? (console=ttyS0,9600 console=tty0) And then recreate?
-Eric
On Friday 09 May 2008 16:26:16 Eric Paris wrote:
> * Very soon after this the machine locks up. The above is the
last entry
> in the messages log. Only the "caps lock" and some other "lock"
icon on
> the keyboard (but not scroll lock) flash, and I have no inbound network
> connection & the screen is blank. I cannot get to a terminal with
> <ALT><F4> . The only option is power cycle.
This is indicative of a kernel panic. Can you attach a serial console
to the machine in question and boot such that kernel messages go to the
serial console? (console=ttyS0,9600 console=tty0) And then recreate?
/me wonders if this is due to fail mode being panic ?
-Steve
On Fri, 2008-05-09 at 16:45 -0400, Steve Grubb wrote:
On Friday 09 May 2008 16:26:16 Eric Paris wrote:
> > * Very soon after this the machine locks up. The above is the last entry
> > in the messages log. Only the "caps lock" and some other
"lock" icon on
> > the keyboard (but not scroll lock) flash, and I have no inbound network
> > connection & the screen is blank. I cannot get to a terminal with
> > <ALT><F4> . The only option is power cycle.
>
> This is indicative of a kernel panic. Can you attach a serial console
> to the machine in question and boot such that kernel messages go to the
> serial console? (console=ttyS0,9600 console=tty0) And then recreate?
/me wonders if this is due to fail mode being panic ?
-Steve
I changed "-f 2" to "-f 1."
May 9 16:58:36 comms auditd[3552]: Audit daemon rotating log files with
keep option
May 9 16:58:45 comms auditd[3552]: Audit daemon rotating log files with
keep option
May 9 16:59:45 comms kernel: printk: 152 messages suppressed.
May 9 16:59:45 comms kernel: audit: audit_backlog=8193 >
audit_backlog_limit=8192
May 9 16:59:45 comms kernel: audit: audit_lost=175 audit_rate_limit=0
audit_backlog_limit=8192
May 9 16:59:45 comms kernel: audit: backlog limit exceeded
May 9 16:59:45 comms kernel: audit: audit_backlog=8193 >
audit_backlog_limit=8192
May 9 16:59:45 comms kernel: audit: audit_lost=176 audit_rate_limit=0
audit_backlog_limit=8192
May 9 16:59:45 comms kernel: audit: backlog limit exceeded
No more kernel panic.
OK - should I roll to the next version and assign different priorities?
I'm so over version 1.7.2 already anyway...
:)
LCB.
--
LC (Lenny) Bruzenak
lenny(a)magitekltd.com
On Friday 09 May 2008 17:05:35 LC Bruzenak wrote:
No more kernel panic.
OK - should I roll to the next version and assign different priorities?
You need to raise the priorities for sure. You can set priority boost to 8 in
auditd for now. If you do change to audit 1.7.3, be aware that selinux policy
has to be updated to allow this new privileged operation. So, you might want
to wait until a policy update can be coordinated.
-Steve