I get this after a handful of hours. It's not terribly deterministic when it's going to melt down, but typically doesn't last more than a few hours before panicking. This is 3.0.3, 64-bit, running Debian Squeeze, running on a usually stable Dell PE 1950. I'm happy to run any sort of traces or send it whatever would be useful in debugging (.config, etc). Output is over IPMI, so it's a tad scrambled, but I didn't want to mess with it for fear of obscuring something important. Environment is heavy NFS-backed web hosting. Backing device that the fscache cache is on is an SSD, but I've seen the same thing on a regular drive. The filesystem for the fscache cache in the below example is EXT4, but I've seen the same thing on XFS. I should mention too that there's nothing special about the 3.0.3 crash. I get similar crashes with 2.6.39.4 and any previous kernel I've tested. 3.0.3 is just the most recent one I've tested. [25625.932971] ------------[ cut here ]------------ [25625.942202] kernel BUG at fs/cachefiles/namei.c:166! [25625.942874] invalid opcode: 0000 [#1] SMP [25625.942874] CPU 6 [25625.942874] Modules linked in: xfs ioatdma dca loop joydev fan evdev i5000_edac edac_core psmouse i5k_amb dcdbas serio_raw shpchp pcspkr pci_hotplug ] [25625.942874] [25625.942874] Pid: 23795, comm: kworker/u:5 Not tainted 3.0.3 #1 Dell Inc. PowerEdge 1950/0DT097 [25625.942874] RIP: 0010:[<ffffffff81299cf3>] [<ffffffff81299cf3>] cachefiles_walk_to_object+0xcb3/0xdd0 [25625.942874] RSP: 0018:ffff8801ab84dc60 EFLAGS: 00010282 [25625.942874] RAX: ffff88003935e601 RBX: ffff8801d8cff330 RCX: 000000000047bea6 [25625.942874] RDX: 000000000047bea5 RSI: 0000000000010200 RDI: ffff88022ec02780 [25625.942874] RBP: ffff8801ab84dd50 R08: 000000000047bea5 R09: ffffea0000c83c20 [25625.942874] R10: ffffffff812982aa R11: 0000000000000003 R12: ffff8801d8cff200 [25625.942874] R13: ffff8801a4a06300 R14: ffff880224ffa780 R15: ffff8801c0dddf00 [25625.942874] FS: 0000000000000000(0000) GS:ffff88022fd80000(0000) knlGS:0000000000000000 [25625.942874] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [25625.942874] CR2: ffffffffff600400 CR3: 00000000016a2000 CR4: 00000000000006f0 [25625.942874] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [25625.942874] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [25625.942874] Process kworker/u:5 (pid: 23795, threadinfo ffff880082bc6300, task ffff880082bc5e00) [25625.942874] Stack: [25625.942874] 0000000000000003 0000000000000000 ffff8801ab84dc90 ffff880082bc5e00 [25625.942874] ffff880082bc6228 ffff880082bc6228 ffff880082bc6228 ffff8801ab84dd08 [25625.942874] ffff880082bc5e00 ffff88022eee5310 ffff880104639400 ffff8801f0e5f664 [25625.942874] Call Trace: [25625.942874] [<ffffffff81074010>] ? wake_up_bit+0x40/0x40 [25625.942874] [<ffffffff812973ab>] cachefiles_lookup_object+0x5b/0x170 [25625.942874] [<ffffffff811ad864>] fscache_lookup_object+0xd4/0x2b0 [25625.942874] [<ffffffff811ae789>] fscache_object_work_func+0x4f9/0xd60 [25625.942874] [<ffffffff8106c594>] process_one_work+0x164/0x450 [25625.942874] [<ffffffff811ae290>] ? fscache_enqueue_dependents+0x120/0x120 [25625.942874] [<ffffffff8106cc2b>] worker_thread+0x19b/0x430 [25625.942874] [<ffffffff8106ca90>] ? manage_workers+0x210/0x210 [25625.942874] [<ffffffff81073abe>] kthread+0x9e/0xb0 [25625.942874] [<ffffffff81671194>] kernel_thread_helper+0x4/0x10 [25625.942874] [<ffffffff8166866d>] ? retint_restore_args+0x13/0x13 [25625.942874] [<ffffffff81073a20>] ? kthread_worker_fn+0x1a0/0x1a0 [25625.942874] [<ffffffff81671190>] ? gs_change+0xb/0xb [25625.942874] Code: 00 48 c7 c7 78 6d 90 81 31 c0 e8 92 b0 3c 00 0f 0b eb fe 48 c7 c7 78 7b 90 81 31 c0 e8 80 b0 3c 00 31 f6 4c 89 f7 e8 3d e5 ff ff <0 [25625.942874] RIP [<ffffffff81299cf3>] cachefiles_walk_to_object+0xcb3/0xdd0 [25625.942874] RSP <ffff8801ab84dc60> 2011 Aug 25 07:01:04 boscust2102[25626.490246] ---[ end trace abce6c7388af252a ]--- [25625.932971] ------------[ cu[25626.505216] Kernel panic - not syncing: Fatal exception t here ]--------[25626.520310] Pid: 23795, comm: kworker/u:5 Tainted: G D 3.0.3 #1 ---- 2011 Aug 25[25626.534651] Call Trace: 07:01:04 boscus[25626.542237] [<ffffffff81664c4e>] panic+0xbf/0x1da t2102 [25625.942[25626.554578] [<ffffffff8104ef9f>] ? kmsg_dump+0x4f/0x100 874] invalid opc[25626.567722] [<ffffffff81669655>] oops_end+0xa5/0xf0 ode: 0000 [#1] S[25626.580262] [<ffffffff810058db>] die+0x5b/0x90 MP [25626.592190] [<ffffffff81669170>] do_trap+0x190/0x1a0 [25626.602854] [<ffffffff8166bf2a>] ? atomic_notifier_call_chain+0x1a/0x20 [25626.616517] [<ffffffff810034f5>] do_invalid_op+0x95/0xb0 [25626.627565] [<ffffffff81299cf3>] ? cachefiles_walk_to_object+0xcb3/0xdd0 [25626.641457] [<ffffffff812febfa>] ? trace_hardirqs_off_thunk+0x3a/0x6c [25626.654860] [<ffffffff812982aa>] ? cachefiles_printk_object+0x7a/0x90 [25626.668259] [<ffffffff8166869d>] ? restore_args+0x30/0x30 [25626.679472] [<ffffffff8167101a>] invalid_op+0x1a/0x20 [25626.689963] [<ffffffff812982aa>] ? cachefiles_printk_object+0x7a/0x90 [25626.703239] [<ffffffff81299cf3>] ? cachefiles_walk_to_object+0xcb3/0xdd0 [25626.716973] [<ffffffff81074010>] ? wake_up_bit+0x40/0x40 [25626.727868] [<ffffffff812973ab>] cachefiles_lookup_object+0x5b/0x170 [25626.740810] [<ffffffff811ad864>] fscache_lookup_object+0xd4/0x2b0 [25626.753283] [<ffffffff811ae789>] fscache_object_work_func+0x4f9/0xd60 [25626.766459] [<ffffffff8106c594>] process_one_work+0x164/0x450 [25626.778255] [<ffffffff811ae290>] ? fscache_enqueue_dependents+0x120/0x120 [25626.792232] [<ffffffff8106cc2b>] worker_thread+0x19b/0x430 [25626.803638] [<ffffffff8106ca90>] ? manage_workers+0x210/0x210 [25626.815400] [<ffffffff81073abe>] kthread+0x9e/0xb0 [25626.825307] [<ffffffff81671194>] kernel_thread_helper+0x4/0x10 [25626.837233] [<ffffffff8166866d>] ? retint_restore_args+0x13/0x13 [25626.849515] [<ffffffff81073a20>] ? kthread_worker_fn+0x1a0/0x1a0 [25626.861838] [<ffffffff81671190>] ? gs_change+0xb/0xb [25626.881978] Rebooting in 120 seconds.. -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs