On Thu, Mar 05, 2015 at 12:02:17PM -0500, J. Bruce Fields wrote: > On Thu, Mar 05, 2015 at 10:01:38AM -0500, J. Bruce Fields wrote: > > On Thu, Mar 05, 2015 at 02:17:31PM +0100, Christoph Hellwig wrote: > > > On Wed, Mar 04, 2015 at 11:08:49PM -0500, J. Bruce Fields wrote: > > > > Ah-hah: > > > > > > > > static void > > > > nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls) > > > > { > > > > ... > > > > nfsd4_cb_layout_fail(ls); > > > > > > > > That'd do it! > > > > > > > > Haven't tried to figure out why exactly that's getting called, and why > > > > only rarely. Some intermittent problem with the callback path, I guess. > > > > > > > > Anyway, I think that solves most of the mystery.... > > > > > > Ooops, that was a nasty git merge error in the last rebase, see the fix > > > below. > > > > Thanks! > > And with that fix things look good. > > I'm still curious why the callbacks are failling. It's also logging > "nfsd: client 192.168.122.32 failed to respond to layout recall". I spoke too soon, I'm still not getting through my usual test run--the most recent run is hanging in generic/247 with the following in the server logs. But I probably still won't get a chance to look at this any closer till after vault. --b. nfsd: client 192.168.122.32 failed to respond to layout recall. Fencing.. nfsd: fence failed for client 192.168.122.32: -2! nfsd: client 192.168.122.32 failed to respond to layout recall. Fencing.. nfsd: fence failed for client 192.168.122.32: -2! receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt ffff88005639a000 xid c21abd62 kswapd0: page allocation failure: order:0, mode:0x120 CPU: 0 PID: 580 Comm: kswapd0 Not tainted 4.0.0-rc2-09922-g26cbcc7 #89 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153950- 04/01/2014 0000000000000000 ffff88007f803bc8 ffffffff81a92597 0000000080000101 0000000000000120 ffff88007f803c58 ffffffff81155322 ffffffff822e9448 0000000000000000 0000000000000000 0000000000000001 00000000ffffffff Call Trace: <IRQ> [<ffffffff81a92597>] dump_stack+0x4f/0x7b [<ffffffff81155322>] warn_alloc_failed+0xe2/0x130 [<ffffffff81158503>] __alloc_pages_nodemask+0x5f3/0x860 [<ffffffff8195b2ab>] skb_page_frag_refill+0xab/0xd0 [<ffffffff817b5fba>] try_fill_recv+0x38a/0x7d0 [<ffffffff817b6af2>] virtnet_receive+0x6f2/0x860 [<ffffffff817b6c81>] virtnet_poll+0x21/0x90 [<ffffffff8197344a>] net_rx_action+0x1aa/0x320 [<ffffffff810796f2>] __do_softirq+0xf2/0x370 [<ffffffff81079ad5>] irq_exit+0x65/0x70 [<ffffffff81a9e124>] do_IRQ+0x64/0x100 [<ffffffff811d56f2>] ? free_buffer_head+0x22/0x80 [<ffffffff81a9c16f>] common_interrupt+0x6f/0x6f <EOI> [<ffffffff81092224>] ? kernel_text_address+0x64/0x70 [<ffffffff81197ff7>] ? kmem_cache_free+0xb7/0x230 [<ffffffff811d56f2>] free_buffer_head+0x22/0x80 [<ffffffff811d5b19>] try_to_free_buffers+0x79/0xc0 [<ffffffff81394b1b>] xfs_vm_releasepage+0x6b/0x160 [<ffffffff81184dc0>] ? __page_check_address+0x1f0/0x1f0 [<ffffffff8114fcc5>] try_to_release_page+0x35/0x60 [<ffffffff81162e02>] shrink_page_list+0x9c2/0xba0 [<ffffffff810bba1d>] ? trace_hardirqs_on_caller+0x12d/0x1d0 [<ffffffff811635df>] shrink_inactive_list+0x23f/0x520 [<ffffffff81164155>] shrink_lruvec+0x595/0x6d0 [<ffffffff81161a39>] ? shrink_slab.part.58.constprop.67+0x299/0x410 [<ffffffff811642dd>] shrink_zone+0x4d/0xa0 [<ffffffff81164c01>] kswapd+0x471/0xa30 [<ffffffff81164790>] ? try_to_free_pages+0x460/0x460 [<ffffffff81093a2f>] kthread+0xef/0x110 [<ffffffff81093940>] ? kthread_create_on_node+0x220/0x220 [<ffffffff81a9b5ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81093940>] ? kthread_create_on_node+0x220/0x220 Mem-Info: DMA per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 CPU 1: hi: 0, btch: 1 usd: 0 DMA32 per-cpu: CPU 0: hi: 186, btch: 31 usd: 0 CPU 1: hi: 186, btch: 31 usd: 64 active_anon:7053 inactive_anon:2435 isolated_anon:0 active_file:88743 inactive_file:89505 isolated_file:32 unevictable:0 dirty:9786 writeback:0 unstable:0 free:3571 slab_reclaimable:227807 slab_unreclaimable:75772 mapped:21010 shmem:380 pagetables:1567 bounce:0 free_cma:0 DMA free:7788kB min:44kB low:52kB high:64kB active_anon:204kB inactive_anon:260kB active_file:696kB inactive_file:1016kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:32kB writeback:0kB mapped:924kB shmem:36kB slab_reclaimable:1832kB slab_unreclaimable:1632kB kernel_stack:48kB pagetables:80kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 1943 1943 1943 DMA32 free:6188kB min:5616kB low:7020kB high:8424kB active_anon:28008kB inactive_anon:9480kB active_file:354360kB inactive_file:357004kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:2080640kB managed:1990060kB mlocked:0kB dirty:39112kB writeback:0kB mapped:83116kB shmem:1484kB slab_reclaimable:909396kB slab_unreclaimable:301708kB kernel_stack:2608kB pagetables:6188kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no lowmem_reserve[]: 0 0 0 0 DMA: 21*4kB (UEM) 11*8kB (UM) 10*16kB (UEM) 3*32kB (UM) 1*64kB (E) 1*128kB (E) 2*256kB (UM) 1*512kB (M) 2*1024kB (UM) 2*2048kB (ER) 0*4096kB = 7788kB DMA32: 276*4kB (UM) 30*8kB (UM) 83*16kB (UM) 2*32kB (MR) 2*64kB (M) 2*128kB (R) 2*256kB (R) 1*512kB (R) 0*1024kB 1*2048kB (R) 0*4096kB = 6192kB Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB 178684 total pagecache pages 1 pages in swap cache Swap cache stats: add 14, delete 13, find 0/0 Free swap = 839620kB Total swap = 839676kB 524158 pages RAM 0 pages HighMem/MovableOnly 22666 pages reserved nfsd: client 192.168.122.32 failed to respond to layout recall. Fencing.. nfsd: fence failed for client 192.168.122.32: -2! receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt ffff880051dfc000 xid 8ff02aaf INFO: task nfsd:17653 blocked for more than 120 seconds. Not tainted 4.0.0-rc2-09922-g26cbcc7 #89 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. nfsd D ffff8800753a7848 11720 17653 2 0x00000000 ffff8800753a7848 0000000000000001 0000000000000001 ffffffff82210580 ffff88004e9bcb50 0000000000000006 ffffffff8119d9f0 ffff8800753a7848 ffff8800753a7fd8 ffff88002e5e3d70 0000000000000246 ffff88004e9bcb50 Call Trace: [<ffffffff8119d9f0>] ? new_sync_read+0xb0/0xb0 [<ffffffff8119d9f0>] ? new_sync_read+0xb0/0xb0 [<ffffffff81a95737>] schedule+0x37/0x90 [<ffffffff81a95ac8>] schedule_preempt_disabled+0x18/0x30 [<ffffffff81a97756>] mutex_lock_nested+0x156/0x400 [<ffffffff813a0d5a>] ? xfs_file_buffered_aio_write.isra.9+0x6a/0x2a0 [<ffffffff8119d9f0>] ? new_sync_read+0xb0/0xb0 [<ffffffff813a0d5a>] xfs_file_buffered_aio_write.isra.9+0x6a/0x2a0 [<ffffffff8119d9f0>] ? new_sync_read+0xb0/0xb0 [<ffffffff813a1016>] xfs_file_write_iter+0x86/0x130 [<ffffffff8119db05>] do_iter_readv_writev+0x65/0xa0 [<ffffffff8119f0b2>] do_readv_writev+0xe2/0x280 [<ffffffff813a0f90>] ? xfs_file_buffered_aio_write.isra.9+0x2a0/0x2a0 [<ffffffff813a0f90>] ? xfs_file_buffered_aio_write.isra.9+0x2a0/0x2a0 [<ffffffff810bc15d>] ? __lock_acquire+0x37d/0x2040 [<ffffffff810be759>] ? lock_release_non_nested+0xa9/0x340 [<ffffffff81a9aaad>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffff810bba1d>] ? trace_hardirqs_on_caller+0x12d/0x1d0 [<ffffffff810bbacd>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8119f2d9>] vfs_writev+0x39/0x50 [<ffffffffa00ab861>] nfsd_vfs_write.isra.11+0xa1/0x350 [nfsd] [<ffffffffa00aecee>] nfsd_write+0x8e/0x100 [nfsd] [<ffffffffa00b95b5>] ? gen_boot_verifier+0x5/0xc0 [nfsd] [<ffffffffa00b97f5>] nfsd4_write+0x185/0x1e0 [nfsd] [<ffffffffa00bbe37>] nfsd4_proc_compound+0x3c7/0x6f0 [nfsd] [<ffffffffa00a7463>] nfsd_dispatch+0xc3/0x220 [nfsd] [<ffffffffa001314f>] svc_process_common+0x43f/0x650 [sunrpc] [<ffffffffa00134a3>] svc_process+0x143/0x260 [sunrpc] [<ffffffffa00a6cc7>] nfsd+0x167/0x1e0 [nfsd] [<ffffffffa00a6b65>] ? nfsd+0x5/0x1e0 [nfsd] [<ffffffffa00a6b60>] ? nfsd_destroy+0xe0/0xe0 [nfsd] [<ffffffff81093a2f>] kthread+0xef/0x110 [<ffffffff81093940>] ? kthread_create_on_node+0x220/0x220 [<ffffffff81a9b5ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81093940>] ? kthread_create_on_node+0x220/0x220 2 locks held by nfsd/17653: #0: (sb_writers#11){.+.+.+}, at: [<ffffffff8119f208>] do_readv_writev+0x238/0x280 #1: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff813a0d5a>] xfs_file_buffered_aio_write.isra.9+0x6a/0x2a0 INFO: task nfsd:17655 blocked for more than 120 seconds. ... _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs