I thought I'd be smart and give 3.11.1 a whirl instead, but got the same error. Stack trace is almost identical (gdb output at the end). kernel: [11242.664430] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 kernel: [11242.715321] IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 kernel: [11242.722639] PGD 0 kernel: [11242.724908] Thread overran stack, or stack corrupted kernel: [11242.730464] Oops: 0000 [#1] SMP kernel: [11242.734111] Modules linked in: dell_rbu wmi sb_edac edac_core hid_generic usbhid hid kernel: [11242.742907] CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1 kernel: [11242.749336] Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012 kernel: [11242.757710] task: ffff8804203460c0 ti: ffff880420346640 task.ti: ffff880420346640 kernel: [11242.766082] RIP: 0010:[<ffffffff812a1903>] [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 kernel: [11242.776126] RSP: 0018:ffff8801053af878 EFLAGS: 00210286 kernel: [11242.782092] RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8 kernel: [11242.790087] RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780 kernel: [11242.798090] RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003 kernel: [11242.806092] R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440 kernel: [11242.814093] R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000 kernel: [11242.822097] FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700 kernel: [11242.831188] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 kernel: [11242.837640] CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0 kernel: [11242.845643] Stack: kernel: [11242.847914] ffffea00055ff440 ffffea00055ff440 ffff8800be2f8780 0000000000001000 kernel: [11242.856298] ffff8801053af8b8 ffffffff81365a72 ffffea00055ff440 0000000000000000 kernel: [11242.864680] ffff8801053af8e8 ffffffff813553d5 000000000000000e ffffea00055ff440 kernel: [11242.873061] Call Trace: kernel: [11242.875827] [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70 kernel: [11242.883448] [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90 kernel: [11242.890098] [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90 kernel: [11242.896748] [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620 kernel: [11242.905072] [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0 kernel: [11242.912302] [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70 kernel: [11242.919625] [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40 kernel: [11242.926364] [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70 kernel: [11242.932915] [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120 kernel: [11242.940148] [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0 kernel: [11242.946605] [<ffffffff81357f78>] nfs_setattr+0xc8/0x150 kernel: [11242.952574] [<ffffffff8122d95b>] notify_change+0x1cb/0x390 kernel: [11242.958836] [<ffffffff8120a55b>] do_truncate+0x7b/0xc0 kernel: [11242.964708] [<ffffffff8121f96c>] do_last+0xa4c/0xfd0 kernel: [11242.970385] [<ffffffff8121ffbc>] path_openat+0xcc/0x670 kernel: [11242.976356] [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0 kernel: [11242.982325] [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0 kernel: [11242.988392] [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50 kernel: [11242.994658] [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24 kernel: [11243.001013] Code: e8 03 67 f1 ff eb 84 90 66 66 66 66 90 55 48 89 e5 48 83 ec 20 48 89 5d e8 48 89 fb 4c 89 65 f0 4c 89 6d f8 49 89 f4 48 8b 43 48 <80> 78 10 00 0f 84 c0 00 00 00 4d 85 e4 0f 84 ed 00 00 00 f0 ff kernel: [11243.023181] RIP [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 kernel: [11243.030623] RSP <ffff8801053af878> kernel: [11243.034548] CR2: 0000000000000010 kernel: [11243.038929] ---[ end trace 1eecf1a047ce4413 ]--- This next part seems to be related. It happened on the previous stacktrace too but I chalked it up to coincidence. I had run that same 3.10.10 (but having rebooted without fscache turned on) over the weekend without any error like this. Could still be coincidental but so far I've only seen it after (though not immediately after) the fscache traceback. The load on the box skyrockets and an 8 core box is up near 60% in sys (usually quite a bit less, like in the 20%-40% range at most). kernel: [11607.613260] ------------[ cut here ]------------ kernel: [11607.618538] WARNING: CPU: 0 PID: 6824 at net/sched/sch_generic.c:260 dev_watchdog+0x267/0x270() kernel: [11607.628403] NETDEV WATCHDOG: eth0 (tg3): transmit queue 0 timed out kernel: [11607.635526] Modules linked in: dell_rbu wmi sb_edac edac_core hid_generic usbhid hid kernel: [11607.645083] CPU: 0 PID: 6824 Comm: /usr/sbin/apach Tainted: G D 3.11.1 #1 kernel: [11607.653777] Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012 kernel: [11607.662272] ffffffff8112c2ac 0000000000000000 0000000000000104 ffff880227c03d38 kernel: [11607.671013] ffffffff81d59846 000000000000257e ffff880227c03d88 ffff880227c03d78 kernel: [11607.679758] ffffffff810bafbc ffff880227c03d88 ffffffff82607450 0000000000000104 kernel: [11607.688501] Call Trace: kernel: [11607.691332] <IRQ> [<ffffffff8112c2ac>] ? print_modules+0x6c/0xd0 kernel: [11607.698526] [<ffffffff81d59846>] dump_stack+0x46/0x58 kernel: [11607.704377] [<ffffffff810bafbc>] warn_slowpath_common+0x8c/0xc0 kernel: [11607.711196] [<ffffffff810bb111>] warn_slowpath_fmt+0x61/0x70 kernel: [11607.717728] [<ffffffff81bf7376>] ? destroy_conntrack+0xa6/0xe0 kernel: [11607.724454] [<ffffffff81bb27b7>] dev_watchdog+0x267/0x270 kernel: [11607.730687] [<ffffffff81bb2550>] ? pfifo_fast_dequeue+0xf0/0xf0 kernel: [11607.737508] [<ffffffff81bb2550>] ? pfifo_fast_dequeue+0xf0/0xf0 kernel: [11607.744331] [<ffffffff810ca788>] call_timer_fn+0x38/0x140 kernel: [11607.750563] [<ffffffff81bb2550>] ? pfifo_fast_dequeue+0xf0/0xf0 kernel: [11607.757382] [<ffffffff810cbaec>] run_timer_softirq+0x25c/0x2b0 kernel: [11607.764099] [<ffffffff810c07ff>] __do_softirq+0xcf/0x270 kernel: [11607.770237] [<ffffffff810c0b30>] ? irq_exit+0xb0/0xd0 kernel: [11607.776087] [<ffffffff81d71f8c>] call_softirq+0x1c/0x30 kernel: [11607.782120] <EOI> [<ffffffff81055e9d>] do_softirq+0x4d/0x80 kernel: [11607.788821] [<ffffffff810c0724>] local_bh_enable+0x84/0x90 kernel: [11607.795157] [<ffffffff81cbb25f>] ipt_do_table+0x20f/0x5b0 kernel: [11607.801400] [<ffffffff81bff1f6>] ? __nf_ct_ext_add_length+0x156/0x2a0 kernel: [11607.808806] [<ffffffff81cbde45>] iptable_filter_hook+0x55/0xa0 kernel: [11607.815533] [<ffffffff81beff5f>] nf_iterate+0x8f/0xd0 kernel: [11607.821379] [<ffffffff81c62150>] ? ip_forward_options+0x200/0x200 kernel: [11607.828395] [<ffffffff81bf001d>] nf_hook_slow+0x7d/0x150 kernel: [11607.834537] [<ffffffff81c62150>] ? ip_forward_options+0x200/0x200 kernel: [11607.841548] [<ffffffff81c64312>] __ip_local_out+0x72/0x80 kernel: [11607.847786] [<ffffffff81c64339>] ip_local_out+0x19/0x40 kernel: [11607.853820] [<ffffffff81c644aa>] ip_queue_xmit+0x14a/0x3d0 kernel: [11607.860151] [<ffffffff81c7dcff>] tcp_transmit_skb+0x44f/0x950 kernel: [11607.866779] [<ffffffff81b75679>] ? __alloc_skb+0x89/0x2a0 kernel: [11607.873016] [<ffffffff81c7e3c8>] tcp_write_xmit+0x118/0xaa0 kernel: [11607.879445] [<ffffffff812140a3>] ? __check_object_size+0x53/0x140 kernel: [11607.886460] [<ffffffff81c7edfc>] __tcp_push_pending_frames+0x4c/0xd0 kernel: [11607.893768] [<ffffffff81c6f29d>] tcp_sendmsg+0x15d/0x1a30 kernel: [11607.900005] [<ffffffff81c9c6eb>] inet_sendmsg+0x7b/0xc0 kernel: [11607.906051] [<ffffffff81572b48>] ? security_socket_sendmsg+0x38/0x50 kernel: [11607.913360] [<ffffffff81b6ab05>] sock_aio_write+0x165/0x190 kernel: [11607.919787] [<ffffffff81b6a9a0>] ? __sock_recv_ts_and_drops+0x160/0x160 kernel: [11607.927387] [<ffffffff8120beba>] do_sync_readv_writev+0x9a/0xe0 kernel: [11607.934210] [<ffffffff8120ddef>] do_readv_writev+0xdf/0x310 kernel: [11607.940638] [<ffffffff8120e07c>] vfs_writev+0x5c/0x80 kernel: [11607.946485] [<ffffffff8120e247>] SyS_writev+0x57/0xd0 kernel: [11607.952335] [<ffffffff81d706f0>] tracesys+0xe0/0xe5 kernel: [11607.957987] ---[ end trace 1eecf1a047ce4414 ]--- kernel: [11607.963255] tg3 0000:02:00.0 eth0: transmit timed out, resetting kernel: [11609.228275] tg3 0000:02:00.0 eth0: 0x00000000: 0x165f14e4, 0x00100406, 0x02000000, 0x00800010 kernel: [11609.237934] tg3 0000:02:00.0 eth0: 0x00000010: 0xd90a000c, 0x00000000, 0xd90b000c, 0x00000000 kernel: [11609.247595] tg3 0000:02:00.0 eth0: 0x00000020: 0xd90c000c, 0x00000000, 0x00000000, 0x04f81028 kernel: [11609.257256] tg3 0000:02:00.0 eth0: 0x00000030: 0xdd000000, 0x00000048, 0x00000000, 0x0000010f kernel: [11609.266918] tg3 0000:02:00.0 eth0: 0x00000040: 0x00000000, 0x1c000000, 0xc8035001, 0x64002008 kernel: [11609.276580] tg3 0000:02:00.0 eth0: 0x00000050: 0x818c5803, 0x78000000, 0x0086a005, 0x00000000 .... lots more .... kernel: [11612.376448] tg3 0000:02:00.0 eth0: 0: Host status block [00000001:0000009c:(0000:01d6:0000):(0000:0113)] kernel: [11612.387179] tg3 0000:02:00.0 eth0: 0: NAPI info [0000001c:0000001c:(0113:0026:01ff):0000:(04de:0000:0000:0000)] kernel: [11612.398590] tg3 0000:02:00.0 eth0: 1: Host status block [00000001:00000019:(0000:0000:0000):(00fd:0000)] kernel: [11612.409322] tg3 0000:02:00.0 eth0: 1: NAPI info [00000019:00000019:(0000:0000:01ff):00fd:(00fd:00fd:0000:0000)] kernel: [11612.420732] tg3 0000:02:00.0 eth0: 2: Host status block [00000001:00000006:(0214:0000:0000):(0000:0000)] kernel: [11612.431462] tg3 0000:02:00.0 eth0: 2: NAPI info [00000006:00000006:(0000:0000:01ff):0214:(0214:0214:0000:0000)] kernel: [11612.442874] tg3 0000:02:00.0 eth0: 3: Host status block [00000001:000000c1:(0000:0000:0000):(0000:0000)] kernel: [11612.453606] tg3 0000:02:00.0 eth0: 3: NAPI info [000000a4:000000a4:(0000:0000:01ff):0246:(0246:0246:0000:0000)] kernel: [11612.465017] tg3 0000:02:00.0 eth0: 4: Host status block [00000001:000000ee:(0000:0000:0ebf):(0000:0000)] kernel: [11612.475748] tg3 0000:02:00.0 eth0: 4: NAPI info [000000ee:000000ee:(0000:0000:01ff):0ebf:(06bf:06bf:0000:0000)] kernel: [11612.530306] tg3 0000:02:00.0 eth0: Link is down kernel: [11616.446912] tg3 0000:02:00.0 eth0: Link is up at 1000 Mbps, full duplex kernel: [11616.454423] tg3 0000:02:00.0 eth0: Flow control is on for TX and on for RX kernel: [11616.462213] tg3 0000:02:00.0 eth0: EEE is disabled This is from the 3.11.1 vmlinux: (gdb) disas __fscache_uncache_page Dump of assembler code for function __fscache_uncache_page: 0xffffffff812a18e0 <+0>: callq 0xffffffff81d6ff00 0xffffffff812a18e5 <+5>: push %rbp 0xffffffff812a18e6 <+6>: mov %rsp,%rbp 0xffffffff812a18e9 <+9>: sub $0x20,%rsp 0xffffffff812a18ed <+13>: mov %rbx,-0x18(%rbp) 0xffffffff812a18f1 <+17>: mov %rdi,%rbx 0xffffffff812a18f4 <+20>: mov %r12,-0x10(%rbp) 0xffffffff812a18f8 <+24>: mov %r13,-0x8(%rbp) 0xffffffff812a18fc <+28>: mov %rsi,%r12 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax) 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237> 0xffffffff812a190d <+45>: test %r12,%r12 0xffffffff812a1910 <+48>: je 0xffffffff812a1a03 <__fscache_uncache_page+291> 0xffffffff812a1916 <+54>: lock incl 0x19e8a07(%rip) # 0xffffffff82c8a324 0xffffffff812a191d <+61>: mov (%r12),%rax 0xffffffff812a1921 <+65>: test $0x10,%ah 0xffffffff812a1924 <+68>: jne 0xffffffff812a1938 <__fscache_uncache_page+88> 0xffffffff812a1926 <+70>: mov -0x18(%rbp),%rbx 0xffffffff812a192a <+74>: mov -0x10(%rbp),%r12 0xffffffff812a192e <+78>: mov -0x8(%rbp),%r13 0xffffffff812a1932 <+82>: leaveq 0xffffffff812a1933 <+83>: retq 0xffffffff812a1934 <+84>: nopl 0x0(%rax) 0xffffffff812a1938 <+88>: lea 0x10(%rbx),%r13 0xffffffff812a193c <+92>: mov %r13,%rdi 0xffffffff812a193f <+95>: callq 0xffffffff81d66ef0 <_raw_spin_lock> 0xffffffff812a1944 <+100>: mov 0x40(%rbx),%rdi 0xffffffff812a1948 <+104>: test %rdi,%rdi 0xffffffff812a194b <+107>: je 0xffffffff812a19b0 <__fscache_uncache_page+208> 0xffffffff812a194d <+109>: lock andb $0xfd,0x70(%rbx) 0xffffffff812a1952 <+114>: lock btrl $0xc,(%r12) 0xffffffff812a1959 <+121>: sbb %eax,%eax 0xffffffff812a195b <+123>: test %eax,%eax 0xffffffff812a195d <+125>: je 0xffffffff812a19b7 <__fscache_uncache_page+215> 0xffffffff812a195f <+127>: mov 0x10(%rdi),%rax 0xffffffff812a1963 <+131>: mov (%rax),%rdx 0xffffffff812a1966 <+134>: cmpq $0x0,0x98(%rdx) 0xffffffff812a196e <+142>: je 0xffffffff812a19b7 <__fscache_uncache_page+215> 0xffffffff812a1970 <+144>: lock incl 0x19e88f1(%rip) # 0xffffffff82c8a268 0xffffffff812a1977 <+151>: jno 0xffffffff812a1982 <__fscache_uncache_page+162> 0xffffffff812a1979 <+153>: lock decl 0x19e88e8(%rip) # 0xffffffff82c8a268 0xffffffff812a1980 <+160>: int $0x4 0xffffffff812a1982 <+162>: mov (%rax),%rax 0xffffffff812a1985 <+165>: add $0xffffffffffffff80,%rdi 0xffffffff812a1989 <+169>: mov %r12,%rsi 0xffffffff812a198c <+172>: callq *0x98(%rax) 0xffffffff812a1992 <+178>: lock decl 0x19e88cf(%rip) # 0xffffffff82c8a268 0xffffffff812a1999 <+185>: jno 0xffffffff812a19a4 <__fscache_uncache_page+196> 0xffffffff812a199b <+187>: lock incl 0x19e88c6(%rip) # 0xffffffff82c8a268 0xffffffff812a19a2 <+194>: int $0x4 0xffffffff812a19a4 <+196>: jmp 0xffffffff812a1926 <__fscache_uncache_page+70> 0xffffffff812a19a6 <+198>: nopw %cs:0x0(%rax,%rax,1) ---Type <return> to continue, or q <return> to quit--- 0xffffffff812a19b0 <+208>: lock andb $0xef,0x1(%r12) 0xffffffff812a19b7 <+215>: mov %r13,%rdi 0xffffffff812a19ba <+218>: callq 0xffffffff81d66e40 <_raw_spin_unlock> 0xffffffff812a19bf <+223>: mov -0x18(%rbp),%rbx 0xffffffff812a19c3 <+227>: mov -0x10(%rbp),%r12 0xffffffff812a19c7 <+231>: mov -0x8(%rbp),%r13 0xffffffff812a19cb <+235>: leaveq 0xffffffff812a19cc <+236>: retq 0xffffffff812a19cd <+237>: mov $0xffffffff8250dc56,%rdi 0xffffffff812a19d4 <+244>: xor %eax,%eax 0xffffffff812a19d6 <+246>: callq 0xffffffff81d4fb02 <printk> 0xffffffff812a19db <+251>: mov $0xffffffff8250dc5a,%rdi 0xffffffff812a19e2 <+258>: xor %eax,%eax 0xffffffff812a19e4 <+260>: callq 0xffffffff81d4fb02 <printk> 0xffffffff812a19e9 <+265>: mov 0x48(%rbx),%rax 0xffffffff812a19ed <+269>: xor %edx,%edx 0xffffffff812a19ef <+271>: mov $0xffffffff8250de53,%rdi 0xffffffff812a19f6 <+278>: movzbl 0x10(%rax),%esi 0xffffffff812a19fa <+282>: xor %eax,%eax 0xffffffff812a19fc <+284>: callq 0xffffffff81d4fb02 <printk> 0xffffffff812a1a01 <+289>: ud2 0xffffffff812a1a03 <+291>: mov $0xffffffff8250dc56,%rdi 0xffffffff812a1a0a <+298>: xor %eax,%eax 0xffffffff812a1a0c <+300>: callq 0xffffffff81d4fb02 <printk> 0xffffffff812a1a11 <+305>: mov $0xffffffff8250dc5a,%rdi 0xffffffff812a1a18 <+312>: xor %eax,%eax 0xffffffff812a1a1a <+314>: callq 0xffffffff81d4fb02 <printk> 0xffffffff812a1a1f <+319>: xor %edx,%edx 0xffffffff812a1a21 <+321>: xor %esi,%esi 0xffffffff812a1a23 <+323>: mov $0xffffffff8250de53,%rdi 0xffffffff812a1a2a <+330>: xor %eax,%eax 0xffffffff812a1a2c <+332>: callq 0xffffffff81d4fb02 <printk> 0xffffffff812a1a31 <+337>: ud2 End of assembler dump. On Wed, Sep 11, 2013 at 3:56 PM, David Howells <dhowells@xxxxxxxxxx> wrote: > Mark Moseley <moseleymark@xxxxxxxxx> wrote: > > > What gdb commands should I use, to be of most use? > > Either: > > gdb fscache.ko > > or: > > gdb vmlinux > > as appropriate. Then: > > disas __fscache_uncache_page > > Note that the code may no longer match after you've recompiled, though it's > worth a look anyway:-/ > > David > -- Linux-cachefs mailing list Linux-cachefs@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cachefs