fyi: this patch [1] seems to fix the issue for me. The explanation in
the subject would reliably oops my machine.
[1]
Are you still able to reliably reproduce this oops? I'm trying
to
track this down because this bug (or a very similar bug) is causing
some significant headaches here at work, but I haven't had a lot of
luck. I'm using usermode linux, though, so that might be interfering
with things.
On Mon, Mar 5, 2012 at 12:35 AM, Valentin Avram <aval13(a)gmail.com> wrote:
> Finally i found some time and spare server to retest the oops and list_add
> corruptions i was getting with the 3.x kernels and auditd 2.1.3.
>
> I tested now with gentoo's latest stable 3.2.1-gentoo-r2 and kernel.org's
> 3.2.9.
>
> Both get the oops/BUG in the same way and after that, they keep pouring
> list_add corruptions with audit_prune_tre(truncated?) and auditctl as comms.
>
> Since this is not about Gentoo's kernel only, i'll post here the oops in
> 3.2.9 and also attach some list_add corruptions.
>
> 3.2.9 BUG:
>
> kernel: [ 301.240011] BUG: unable to handle kernel NULL pointer dereference
> at (null)
> kernel: [ 301.240305] IP: [<c1238dd0>] __list_del_entry+0x20/0xe0
> kernel: [ 301.240481] *pdpt = 0000000000000000 *pde = f000ddc8f000ddc8
> kernel: [ 301.240698] Oops: 0000 [#1] SMP
> kernel: [ 301.240910]
> kernel: [ 301.241030] Pid: 642, comm: fsnotify_mark Not tainted
> 3.2.9-drbd-version3 #1 Dell Inc. PowerEdge 2950/0CX396
> kernel: [ 301.241370] EIP: 0060:[<c1238dd0>] EFLAGS: 00010287 CPU: 6
> kernel: [ 301.241498] EIP is at __list_del_entry+0x20/0xe0
> kernel: [ 301.241623] EAX: f4fae544 EBX: f47cffa4 ECX: ffffffff EDX:
> 00000000
> kernel: [ 301.241751] ESI: f4fae544 EDI: f4fae508 EBP: f47cff7c ESP:
> f47cff64
> kernel: [ 301.241879] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> kernel: [ 301.242005] Process fsnotify_mark (pid: 642, ti=f47ce000
> task=f4f47c00 task.ti=f47ce000)
> kernel: [ 301.242207] Stack:
> kernel: [ 301.242327] c10813c0 f47cffa4 f4f47c00 f4e70888 f47cff7c
> f47cffa4 f47cffb8 c10f6976
> kernel: [ 301.242882] ffffffc3 f4f47c00 f4f47c00 00000000 f4f47c00
> c10530c0 f47cff9c f47cff9c
> kernel: [ 301.243438] f4fae544 f4fae544 f4c47f58 00000000 c10f68f0
> f47cffe4 c1052834 00000000
> kernel: [ 301.243995] Call Trace:
> kernel: [ 301.244119] [<c10813c0>] ? rcu_check_callbacks+0x110/0x110
> kernel: [ 301.244248] [<c10f6976>] fsnotify_mark_destroy+0x86/0x120
> kernel: [ 301.244377] [<c10530c0>] ? abort_exclusive_wait+0x80/0x80
> kernel: [ 301.244504] [<c10f68f0>] ? fsnotify_put_mark+0x30/0x30
> kernel: [ 301.244631] [<c1052834>] kthread+0x74/0x80
> kernel: [ 301.244756] [<c10527c0>] ? kthread_flush_work_fn+0x10/0x10
> kernel: [ 301.244885] [<c1582ab6>] kernel_thread_helper+0x6/0xd
> kernel: [ 301.245011] Code: 55 f4 8b 45 f8 e9 75 ff ff ff 90 55 89 e5 53 83
> ec 14 8b 08 8b 50 04 81 f9 00 01 10 00 74 24 81 fa 00 02 20 00 0f 84 8e 00
> 00 00 <8b> 1a 39 d8 75 62 8b 59 04 39 d8 75 35 89 51 04 89 0a 83 c4 14
> kernel: [ 301.248195] EIP: [<c1238dd0>] __list_del_entry+0x20/0xe0 SS:ESP
> 0068:f47cff64
> kernel: [ 301.248414] CR2: 0000000000000000
> kernel: [ 301.248538] ---[ end trace 15082dbfb353f84c ]---
>
> The kernel was compiled with the following DEBUG support (the bolded one
> were requested by Gentoo's Dev:
> CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
> CONFIG_SLUB_DEBUG=y
> CONFIG_HAVE_DMA_API_DEBUG=y
> CONFIG_X86_DEBUGCTLMSR=y
> CONFIG_PNP_DEBUG_MESSAGES=y
> CONFIG_AIC94XX_DEBUG=y
> CONFIG_USB_DEBUG=y
> CONFIG_DEBUG_KERNEL=y
> CONFIG_SCHED_DEBUG=y
> CONFIG_DEBUG_RT_MUTEXES=y
> CONFIG_DEBUG_PI_LIST=y
> CONFIG_DEBUG_BUGVERBOSE=y
> CONFIG_DEBUG_INFO=y
> CONFIG_DEBUG_MEMORY_INIT=y
> CONFIG_DEBUG_LIST=y
> CONFIG_DEBUG_STACKOVERFLOW=y
> CONFIG_DEBUG_RODATA=y
> CONFIG_DEBUG_RODATA_TEST=y
>
> I attached the kernel config i used for 3.2.9 to generate this oops and
> warnings.
>
> From the list_add warnings that come after, out of 805 warnings i processed,
> after masking with XXXXX the PID and next= values that kept changing in
> every one, i got 26 types of MD5. I also attached the files relevant as an
> archive to this email.
>
> The Gentoo bug i opened is sleeping, it seems nobody has the time to at
> least test to confirm or not the problems i'm seeing (or everybody's
> thinking that nobody would restart auditd so often, so the bug it's not that
> serious).
>
>
> Thank you for your time.
>
> On Wed, Feb 8, 2012 at 6:11 PM, Valentin Avram <aval13(a)gmail.com> wrote:
>
>
> --
> Linux-audit mailing list
> Linux-audit(a)redhat.com
>
https://www.redhat.com/mailman/listinfo/linux-audit
--
Peter Moody Google 1.650.253.7306
Security Engineer pgp:0xC3410038
--
Peter Moody Google 1.650.253.7306
Security Engineer pgp:0xC3410038