On 10/6/12 10:22 PM, Peter Moody wrote:
 On Fri, Oct 5, 2012 at 1:23 PM, Eric Sandeen
<sandeen(a)redhat.com> wrote:
> On 10/5/12 10:57 AM, Peter Moody wrote:
>> On Fri, Oct 5, 2012 at 8:18 AM, Peter Moody <pmoody(a)google.com> wrote:
>>> On Fri, Oct 5, 2012 at 8:16 AM, Peter Moody <pmoody(a)google.com> wrote:
>>>> On Fri, Oct 5, 2012 at 7:26 AM, Jeff Layton <jlayton(a)redhat.com>
wrote:
>>>>> On Fri, 5 Oct 2012 06:57:59 -0700
>>>>> Peter Moody <pmoody(a)google.com> wrote:
>>>>>
>>>>>> On Fri, Oct 5, 2012 at 5:55 AM, Jeff Layton
<jlayton(a)redhat.com> wrote:
>>>>>>> On Thu, 4 Oct 2012 11:48:23 -0700
>>>>>>> Peter Moody <pmoody(a)google.com> wrote:
>>>>>>>
>>>>>>>> On Wed, Sep 26, 2012 at 6:50 AM, Alexander Viro
<aviro(a)redhat.com> wrote:
>>>>>>>>> On Tue, Sep 25, 2012 at 10:03:23AM -0700, Peter Moody
wrote:
>>>>>>>>>> Hey folks,
>>>>>>>>>>
>>>>>>>>>> following up on old patches, are there any
comments on this? Did you
>>>>>>>>>> get around to finding a better way to fix this
bug, Al?
>>>>>>>>>
>>>>>>>>> Alas, I've found none ;-/  Looks like we'll
have to go with this one,
>>>>>>>>> at least until somebody comes up with better
solution.
>>>>>>>>
>>>>>>>> Not surprisingly, this patch doesn't actually fix the
issue (or at
>>>>>>>> least doesn't do it correctly).
>>>>>>>>
>>>>>>>> I hadn't noticed that get_fs_pwd() actually calls
path_get() on
>>>>>>>> &context->pwd so the additional path_get() is
useless and the
>>>>>>>> reference doesn't ever actually get freed if
audit_putname is called
>>>>>>>> while we're in a syscall.
>>>>>>>>
>>>>>>>> Al, Eric, Jeff; do any of you guys have an understanding
of what the
>>>>>>>> initial bug actually is since this clearly doesn't
fix it?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> peter
>>>>>>>>
>>>>>>>
>>>>>>> BTW, I ran this test on one of my KVM guests and it ran just
fine. That
>>>>>>> one is an x86_64 guest running a 3.6.0+ kernel. The root fs
on there is
>>>>>>> ext4 though, not ext3. So perhaps that's a factor?
>>>>>>>
>>>>>>> The oops message you posted at least looks like something
down in the
>>>>>>> bowels of ext3 or fs/buffer.c.
>>>>>>
>>>>>> Yeah, the only place this actually happens for me on these giant
xen
>>>>>> instances we have (6 cores, 32G ram) and it happens on both ext3
and
>>>>>> ext4 filesystems and it happens with 100% reliability.
>>>>>>
>>>>>> The actual oops is from:
>>>>>>
>>>>>> static inline void check_irqs_on(void)
>>>>>> {
>>>>>> #ifdef irqs_disabled
>>>>>>         BUG_ON(irqs_disabled());
>>>>>> #endif
>>>>>> }
>>>>>>
>>>>>> with the code path looking like:
>>>>>>
>>>>>> __find_get_block() -> lookup_bh_lru() -> check_irqs_on()
-> BUG()
>>>>>>
>>>>>
>>>>> Do you have a backtrace from a more recent kernel? I wonder if
>>>>> something in the syscall exit codepath is disabling IRQs here?
>>>>
>>>> is 3.6.0-rc1 recent enough or do you want something newer?
>>>
>>> nevermind, that doesn't boot. One sec.
>>
>>
>> here's 3.5.0
>>
>>  ------------[ cut here ]------------
>> kernel BUG at fs/buffer.c:1220!
>> invalid opcode: 0000 [#1] SMP
>> CPU 0
>> Pid: 3683, comm: a.out Not tainted 3.5.0 #3
>> RIP: e030:[<ffffffff816a99f4>]  [<ffffffff816a99f4>]
>> check_irqs_on.part.8+0x4/0x6
>> RSP: e02b:ffff8807b156dc28  EFLAGS: 00010046
>> RAX: ffff8807d0dd0000 RBX: ffff8807a7d6df28 RCX: 0000000005883396
>> RDX: 0000000000001000 RSI: 0000000005883396 RDI: ffff8807cfc0c000
>> RBP: ffff8807b156dc28 R08: 0000000000000001 R09: ffff8807a7d6de50
>> R10: f83a2b0a359bf007 R11: 0000000000000000 R12: ffff8807a7d6de54
>> R13: ffff8807a7d6de80 R14: ffff8807cfc1f120 R15: 0000000005883396
>> FS:  00007f97164ec700(0000) GS:ffff8807ffc00000(0063) knlGS:0000000000000000
>> CS:  e033 DS: 002b ES: 002b CR0: 000000008005003b
>> CR2: 00000000f76ca3b0 CR3: 00000007bbb53000 CR4: 0000000000002660
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> Process a.out (pid: 3683, threadinfo ffff8807b156c000, task ffff8807bbae8000)
>
> One thing that crossed my mind is that back in the 4k stacks days I
> think sometimes a blown stack could corrupt threadinfo and we'd get
> spurious warnings like this.
>
> Is there any chance something like that has happened? (any stack
> depth messages, etc?)
 
 I do'nt see any stack depth messages in there, no.
 
> Maybe a crashdump on the BUG() for further poking around would
> be in order.
 
 when I add a painc() there, this is what I get. I'm not seeing
 anything about stack depth, Is there something obvious I'm missing? 
Not missing anything really; it's a long shot I suppose anyway.
It's possible that there was some deep stack excursion that corrupted
threadinfo, and then we went down the path below and the first bad news
we received was when we tested whether we were in an interrupt.
Hm, your other email said that you only started getting the in_interrupt()
warnings AFTER the BUG_ON()?  That makes me think I might be right after
all:
 I booted a few kernels with the WARN_ON successively further
 down the stack in audit land and it only started firing after BUG_ON()
 in fs/buffer.c 
On the other hand, there is a "canary" at the top of the stack which
would usually get overwritten, and a BUG_ON() or panic() should print
that fact, something like "Thread overran stack, or stack corrupted."
What if you build with config STACK_TRACER on, and:
# mount -t debugfs none /sys/kernel/debug
# echo 1 > /proc/sys/kernel/stack_tracer_enabled
# <run your test w/o the panic, get the BUG_ON>
# cat /sys/kernel/debug/tracing/stack_trace
If CONFIG_DEBUG_STACK_USAGE isn't on, that might help too.
I could be way off here but might be worth a test.
-Eric
 Kernel panic - not syncing: irqs disabled
 Pid: 23011, comm: a.out Tainted: G        W    3.2.5-at18-ganetixenu #2
 Call Trace:
  [<ffffffff81692b16>] panic+0x8c/0x1a3
  [<ffffffff8100926d>] ? xen_force_evtchn_callback+0xd/0x10
  [<ffffffff816960d6>] check_irqs_on.part.13+0x25/0x25
  [<ffffffff81161dc9>] __find_get_block+0x1f9/0x200
  [<ffffffff813c7a2f>] ? number.isra.2+0x31f/0x350
  [<ffffffff811c9d85>] ext3_clear_blocks+0x75/0x140
  [<ffffffff811c9f5c>] ext3_free_data+0x10c/0x150
  [<ffffffff811da961>] ? ext3_journal_start_sb+0x31/0x60
  [<ffffffff811ca635>] ext3_truncate+0x4a5/0x600
  [<ffffffff81232f38>] ? journal_start+0xb8/0x100
  [<ffffffff811ccf48>] ext3_evict_inode+0x228/0x2a0
  [<ffffffff8114bfb1>] evict+0xa1/0x1a0
  [<ffffffff8114cc61>] iput+0x101/0x210
  [<ffffffff81148040>] d_kill+0xf0/0x130
  [<ffffffff81148bd2>] dput+0xd2/0x1b0
  [<ffffffff8113eb85>] path_put+0x15/0x30
  [<ffffffff8169348f>] audit_free_names+0xbc/0xdb
  [<ffffffff810ac629>] audit_syscall_exit+0x139/0x1e0
  [<ffffffff8169fdaa>] sysexit_audit+0x21/0x5f
 
> -Eric
>
>> Stack:
>>  ffff8807b156dc98 ffffffff8116a099 ffff8807b59f3000 ffff8807b156dd30
>>  ffff8807b156dd60 ffff8807b156de78 ffff8807b156dc78 ffffffff816af231
>>  ffff8807b156dcd8 ffff8807a7d6e538 ffff8807a7d6df28 ffff8807a7d6de54
>> Call Trace:
>>  [<ffffffff8116a099>] __find_get_block+0x1f9/0x200
>>  [<ffffffff816af231>] ? down_read+0x11/0x30
>>  [<ffffffff811d1405>] ext3_clear_blocks+0x75/0x140
>>  [<ffffffff811d15dc>] ext3_free_data+0x10c/0x150
>>  [<ffffffff811e2061>] ? ext3_journal_start_sb+0x31/0x60
>>  [<ffffffff811d1cb5>] ext3_truncate+0x4a5/0x600
>>  [<ffffffff8123d5b8>] ? journal_start+0xb8/0x100
>>  [<ffffffff8106f406>] ? bit_waitqueue+0x16/0xc0
>>  [<ffffffff811d4598>] ext3_evict_inode+0x248/0x2c0
>>  [<ffffffff81153b9a>] evict+0xaa/0x1b0
>>  [<ffffffff81154843>] iput+0x103/0x210
>>  [<ffffffff8114fc88>] dentry_iput+0x88/0xd0
>>  [<ffffffff811505ec>] dput+0x12c/0x250
>>  [<ffffffff81146275>] path_put+0x15/0x30
>>  [<ffffffff810b2f35>] __audit_syscall_exit+0x2e5/0x460
>>  [<ffffffff816b30be>] sysexit_audit+0x29/0x5b
>> Code: 04 00 00 4c 8d 88 c0 02 00 00 31 c0 e8 5f da ff ff 48 85 db 74
>> 0c 80 43 5c 01 48 89 df e8 d5
>> 6a aa ff 5b 41 5c 5d c3 55 48 89 e5 <0f> 0b 55 48 89 e5 0f 0b 55 48 89
>> e5 0f 0b 55 48 89 e5 41 54 53
>> RIP  [<ffffffff816a99f4>] check_irqs_on.part.8+0x4/0x6
>>  RSP <ffff8807b156dc28>
>> ---[ end trace 8d09f8cfbb601c14 ]---
>>
>>