cc Christoph, who's more familiar with nfs and wrote the iomap bits. On Sun, May 07, 2017 at 09:09:49PM +0200, Paul Menzel wrote: > Dear Linux folks, > > > There seems to be a regression in Linux 4.9 compared to 4.4. Maybe you have > an idea. > > The system used 4.4.38 without issues, and was updated to 4.9.23 on April > 24th. Since Friday, the NFS exports where not accessible anymore. Rebooting > the system into 4.9.24, 4.9.24, and 4.9.25 didn’t change anything, and the > system went into the some lock right away. Booting 4.4.38 fixed the issue > though. > The buffered write path was rewritten with the iomap mechanism around 4.7 or so, so there's a pretty big functionality gap between 4.4 and 4.9. > Here is more information. > > NFS doesn’t respond to a null call. > What exactly is a NULL call? Can this be reproduced easily? Otherwise, it's not clear to me whether you've hit a deadlock or some kind of livelock. Have you checked syslog for any crash or hung task messages? Please also provide the hung task output (echo w > /proc/sysrq-trigger) once you've hit this state. It would be particularly interesting to see whether the iomap_zero_range() path is included in that output. It may also be interesting to enable the xfs_zero_eof() tracepoint (trace-cmd start -e 'xfs:xfs_zero_eof') and see what the last few entries are from /sys/kernel/debug/tracing/trace_pipe. Brian > ``` > # tshark -r x.x > 1 0.000000 141.14.16.23 -> 141.14.26.2 TCP 74 1005 → 2049 [SYN] Seq=0 > Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=327954228 TSecr=0 WS=512 > 2 0.000245 141.14.26.2 -> 141.14.16.23 TCP 74 2049 → 1005 [SYN, ACK] > Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1126765132 > TSecr=327954228 WS=512 > 3 0.000269 141.14.16.23 -> 141.14.26.2 TCP 66 1005 → 2049 [ACK] Seq=1 > Ack=1 Win=29696 Len=0 TSval=327954229 TSecr=1126765132 > 4 0.000300 141.14.16.23 -> 141.14.26.2 NFS 110 V4 NULL Call > 5 0.202652 141.14.16.23 -> 141.14.26.2 TCP 110 [TCP Retransmission] > 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327954432 > TSecr=1126765132 > 6 0.410651 141.14.16.23 -> 141.14.26.2 TCP 110 [TCP Retransmission] > 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327954640 > TSecr=1126765132 > 7 0.818649 141.14.16.23 -> 141.14.26.2 TCP 110 [TCP Retransmission] > 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327955048 > TSecr=1126765132 > 8 1.012429 141.14.26.2 -> 141.14.16.23 TCP 74 [TCP Spurious > Retransmission] 2049 → 1005 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 > SACK_PERM=1 TSval=1126766144 TSecr=327955048 WS=512 > 9 1.012448 141.14.16.23 -> 141.14.26.2 TCP 66 [TCP Dup ACK 3#1] 1005 → > 2049 [ACK] Seq=45 Ack=1 Win=29696 Len=0 TSval=327955241 TSecr=1126765132 > 10 1.674650 141.14.16.23 -> 141.14.26.2 TCP 110 [TCP Retransmission] > 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327955904 > TSecr=1126765132 > 11 3.060502 141.14.26.2 -> 141.14.16.23 TCP 74 [TCP Spurious > Retransmission] 2049 → 1005 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 > SACK_PERM=1 TSval=1126768193 TSecr=327955904 WS=512 > 12 3.060521 141.14.16.23 -> 141.14.26.2 TCP 66 [TCP Dup ACK 3#2] 1005 → > 2049 [ACK] Seq=45 Ack=1 Win=29696 Len=0 TSval=327957289 TSecr=1126765132 > 13 3.338653 141.14.16.23 -> 141.14.26.2 TCP 110 [TCP Retransmission] > 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327957568 > TSecr=1126765132 > ``` > > `` > # tshark -r x.x > 1 0.000000 141.14.16.23 -> 141.14.26.2 TCP 74 690 → 2049 [SYN] Seq=0 > Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=328229577 TSecr=0 WS=512 > 2 0.000039 141.14.26.2 -> 141.14.16.23 TCP 74 2049 → 690 [SYN, ACK] > Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1127040480 > TSecr=328229577 WS=512 > 3 0.000155 141.14.16.23 -> 141.14.26.2 TCP 66 690 → 2049 [ACK] Seq=1 > Ack=1 Win=29696 Len=0 TSval=328229577 TSecr=1127040480 > 4 0.000180 141.14.16.23 -> 141.14.26.2 NFS 110 V4 NULL Call > 5 0.206937 141.14.16.23 -> 141.14.26.2 TCP 110 [TCP Retransmission] 690 > → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=328229784 > TSecr=1127040480 > 6 0.414925 141.14.16.23 -> 141.14.26.2 TCP 110 [TCP Retransmission] 690 > → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=328229992 > TSecr=1127040480 > 7 0.822928 141.14.16.23 -> 141.14.26.2 TCP 110 [TCP Retransmission] 690 > → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=328230400 > TSecr=1127040480 > 8 1.056498 141.14.26.2 -> 141.14.16.23 TCP 74 [TCP Spurious > Retransmission] 2049 → 690 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 > SACK_PERM=1 TSval=1127041536 TSecr=328230400 WS=512 > ``` > > Looking at the stack traces of the nfsd process with > > ``` > for p in $(pgrep 'nfsd$'); do echo $p; cat /proc/$p/stack; done > ``` > > all but one have the trace below. > > > ``` > [<ffffffff813eded7>] call_rwsem_down_write_failed+0x17/0x30 > [<ffffffff8133aa1c>] xfs_file_buffered_aio_write+0x5c/0x2d0 > [<ffffffff8133ad2e>] xfs_file_write_iter+0x9e/0x150 > [<ffffffff81196b50>] do_iter_readv_writev+0xa0/0xf0 > [<ffffffff81197d4a>] do_readv_writev+0x18a/0x230 > [<ffffffff8119803c>] vfs_writev+0x3c/0x50 > [<ffffffffa02826b5>] nfsd_vfs_write+0xc5/0x280 [nfsd] > [<ffffffffa028e48a>] nfsd4_write+0x17a/0x1d0 [nfsd] > [<ffffffffa02902f9>] nfsd4_proc_compound+0x369/0x5e0 [nfsd] > [<ffffffffa027e218>] nfsd_dispatch+0xa8/0x180 [nfsd] > [<ffffffffa01a66af>] svc_process_common+0x3ff/0x580 [sunrpc] > [<ffffffffa01a696b>] svc_process+0x13b/0x1b0 [sunrpc] > [<ffffffffa027dc7d>] nfsd+0xed/0x160 [nfsd] > [<ffffffff8107b0ba>] kthread+0xca/0xe0 > [<ffffffff81a97092>] ret_from_fork+0x22/0x30 > [<ffffffffffffffff>] 0xffffffffffffffff > ``` > > The one problematic one has the stack trace below. > > ``` > [<ffffffff811f6f6c>] iomap_write_begin+0x8c/0x120 > [<ffffffff811f758b>] iomap_zero_range_actor+0xeb/0x210 # oder > iomap_zero_range_actor+0x90/0x210 > [<ffffffff811f77e2>] iomap_apply+0xa2/0x110 > [<ffffffff811f79b8>] iomap_zero_range+0x58/0x80 > [<ffffffff8133a38e>] xfs_zero_eof+0x4e/0xb0 > [<ffffffff8133a58d>] xfs_file_aio_write_checks+0x19d/0x1c0 > [<ffffffff8133aa39>] xfs_file_buffered_aio_write+0x79/0x2d0 > [<ffffffff8133ad2e>] xfs_file_write_iter+0x9e/0x150 > [<ffffffff81196b50>] do_iter_readv_writev+0xa0/0xf0 > [<ffffffff81197d4a>] do_readv_writev+0x18a/0x230 > [<ffffffff8119803c>] vfs_writev+0x3c/0x50 > [<ffffffffa02826b5>] nfsd_vfs_write+0xc5/0x280 [nfsd] > [<ffffffffa028e48a>] nfsd4_write+0x17a/0x1d0 [nfsd] > [<ffffffffa02902f9>] nfsd4_proc_compound+0x369/0x5e0 [nfsd] > [<ffffffffa027e218>] nfsd_dispatch+0xa8/0x180 [nfsd] > [<ffffffffa01a66af>] svc_process_common+0x3ff/0x580 [sunrpc] > [<ffffffffa01a696b>] svc_process+0x13b/0x1b0 [sunrpc] > [<ffffffffa027dc7d>] nfsd+0xed/0x160 [nfsd] > [<ffffffff8107b0ba>] kthread+0xca/0xe0 > [<ffffffff81a97092>] ret_from_fork+0x22/0x30 > [<ffffffffffffffff>] 0xffffffffffffffff > ``` > > Here is the corresponding lines in the source code. > > ``` > info line *0xffffffff811f6f6c Line 124 of "fs/iomap.c" starts at address > 0xffffffff811f6f6c <iomap_write_begin+140> and ends at 0xffffffff811f6f70 > <iomap_write_begin+144>. > info line *0xffffffff811f758b Line 345 of "fs/iomap.c" starts at address > 0xffffffff811f758b <iomap_zero_range_actor+235> and ends at > 0xffffffff811f758f <iomap_zero_range_actor+239>. > info line *0xffffffff811f7530 Line 385 of "fs/iomap.c" starts at address > 0xffffffff811f7530 <iomap_zero_range_actor+144> and ends at > 0xffffffff811f7538 <iomap_zero_range_actor+152>. > ``` > > Here is the code. > > ``` > […] > 109 static int > 110 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len, > unsigned flags, > 111 struct page **pagep, struct iomap *iomap) > 112 { > 113 pgoff_t index = pos >> PAGE_SHIFT; > 114 struct page *page; > 115 int status = 0; > 116 > 117 BUG_ON(pos + len > iomap->offset + iomap->length); > 118 > 119 if (fatal_signal_pending(current)) > 120 return -EINTR; > 121 > 122 page = grab_cache_page_write_begin(inode->i_mapping, index, > flags); > 123 if (!page) > 124 return -ENOMEM; > 125 > 126 status = __block_write_begin_int(page, pos, len, NULL, iomap); > 127 if (unlikely(status)) { > 128 unlock_page(page); > 129 put_page(page); > 130 page = NULL; > 131 > 132 iomap_write_failed(inode, pos, len); > 133 } > 134 > 135 *pagep = page; > 136 return status; > 137 } > […] > ``` > > So it looks like insufficent memory? But the machine has 1 TB of RAM, and I > didn’t see any suspicious running `htop`. Also it should have returned from > with that error, shouldn’t it have? > > > Kind regards, > > Paul > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html