David Woodhouse wrote:
On Fri, 2005-07-08 at 14:18 -0600, Timothy R. Chavez wrote:
>These don't look familiar to me, but I think it'd be good to send them
>out to everyone to take a look...
>
>
HTML-ised, word-wrapped, 'deepSkyBlue' oopses? Can I have some of what
you lot are smoking? :)
The first panic is similar to something else I've seen but not managed
to make any progress with. In that case I was told the precise kernel
and was able to determine that the oops in proc_get_inode was due to an
invalid ->owner field in try_module_get().
That one was sent to me as an OpenOffice document, but I'll do the world
a favour and reproduce it here as text...
Oops: 0000 [1] SMP inode=2 dev=fd:00 mode=040755 ouid=0 ogid=0 rdev=00:00
CPU 7
Modules linked in: michael_mic parport_pc lp parport netconsole netdump autofs4 i2c_dev
i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac md5 ipv6 ohci_hcd ehci_hcd
tg3 dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod aacraid sd_mod scsi_mod
Pid: 10704, comm: mount_test Not tainted 2.6.9-11.EL.audit.74smp
RIP: 0010:[<ffffffff801a6764>] <ffffffff801a6764>{proc_get_inode+199}
RSP: 0018:0000010068eb9ce8 EFLAGS: 00010282
RAX: 0000000000000007 RBX: 000001007cce6d70 RCX: 0000000000000000
RDX: ffffffffa01a5180 RSI: 000001000314b478 RDI: ffffffff80476380
RBP: 0000010068a1e060 R08: 0000010068eb9ca8 R09: 0000000000000000
RBP: 0000010068a1e060 R08: 0000010068eb9ca8 R09: 0000000000000000
R10: 000001006d81c950 R11: 0000000000000058 R12: 000001007ff05538
R13: 0000000000000000 R14: 00000000ffffffea R15: 0000010068eb9d98
FS: 0000002a95583b00(0000) GS:ffffffff804c6f80(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa01a5180 CR3: 000000007fe9a000 CR4: 00000000000006e0
Process mount_test (pid: 10704, threadinfo 0000010068eb8000, task 000001006c5be030)
Stack: 000001007cce6d70 00000100679ed260 000001006963e600 ffffffff801a91da
fffffffffffffff4 00000100679ed260 000001006963e600 00000100679ed368
0000010068eb9e58 ffffffff80182060
Call Trace: <ffffffff801a91da>{proc_lookup+246}
<ffffffff80182060>{do_lookup+230}
<ffffffff80182c76>{link_path_walk+2508}
<ffffffff801832a7>{path_lookup+451}
<ffffffff80183553>{__user_walk+47}
<ffffffff8017e037>{vfs_lstat+21}
<ffffffff80154234>{audit_syscall_entry+306}
<ffffffff8017e369>{sys_newlstat+17}
<ffffffff801141e4>{syscall_trace_enter+161}
<ffffffff80110142>{tracesys+113}
<ffffffff801101a2>{tracesys+209}
Code: 83 3a 02 74 32 89 c0 48 c1 e0 07 48 8d 04 02 ff 80 00 01 00
RIP <ffffffff801a6764>{proc_get_inode+199} RSP <0000010068eb9ce8>
CR2: ffffffffa01a5180
In this case, the owner field of the proc directory in question is set
to 0xffffffffa01a5180, when it _should_ have been a pointer to a valid
'struct module'. We don't seem to have been given the faulting address
in the panic you show, but I'm fairly sure it'll be the same thing. Can
you tell me which /proc file was being accessed when this happened?
This is happening _before_ the audit hooks in path_lookup() are reached;
there shouldn't be anything happening in this particular code path which
is audit-related. There's probably been some problem _beforehand_.
Jeff, wasn't there a netdump in the x86_64 case above?
There is a netdump in the x86_64 case. It is still available to be
looked at.
I stepped away from the issue because I could not reproduce it on any
"official U2 kernel variant"
I was able to reproduce it with audit enabled and with audit disabled on
your version of audit.74. So I assumed that it was a issue in your build
tree.
I can revisit that if you would like.
The second and third oopses are basically the same as each other.
I'm
inclined to suspect that it's the call to fops_get() in dentry_open(),
which is actually another call to try_module_get(). But in that case
it's a _different_ pointer to a struct module, not one in a
proc_dir_entry but one in a struct file_operations. Again, what file is
being access? One in /proc, I'd imagine?
I can write a test that will try and read all files in /proc. May or
may not provide
any data.
How easy is it to reproduce these oopsen? Can it be done with audit
disabled? Can it be done on the base U1 kernel?