Re: Locking problems with Linux 4.9 with NFSD and `fs/iomap.c`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



cc Christoph, who's more familiar with nfs and wrote the iomap bits.

On Sun, May 07, 2017 at 09:09:49PM +0200, Paul Menzel wrote:
> Dear Linux folks,
> 
> 
> There seems to be a regression in Linux 4.9 compared to 4.4. Maybe you have
> an idea.
> 
> The system used 4.4.38 without issues, and was updated to 4.9.23 on April
> 24th. Since Friday, the NFS exports where not accessible anymore. Rebooting
> the system into 4.9.24, 4.9.24, and 4.9.25 didn’t change anything, and the
> system went into the some lock right away. Booting 4.4.38 fixed the issue
> though.
> 

The buffered write path was rewritten with the iomap mechanism around
4.7 or so, so there's a pretty big functionality gap between 4.4 and
4.9.

> Here is more information.
> 
> NFS doesn’t respond to a null call.
> 

What exactly is a NULL call? Can this be reproduced easily?

Otherwise, it's not clear to me whether you've hit a deadlock or some
kind of livelock. Have you checked syslog for any crash or hung task
messages? Please also provide the hung task output (echo w >
/proc/sysrq-trigger) once you've hit this state. It would be
particularly interesting to see whether the iomap_zero_range() path is
included in that output.

It may also be interesting to enable the xfs_zero_eof() tracepoint
(trace-cmd start -e 'xfs:xfs_zero_eof') and see what the last few
entries are from /sys/kernel/debug/tracing/trace_pipe.

Brian

> ```
> # tshark -r x.x
>   1   0.000000 141.14.16.23 -> 141.14.26.2  TCP 74 1005 → 2049 [SYN] Seq=0
> Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=327954228 TSecr=0 WS=512
>   2   0.000245  141.14.26.2 -> 141.14.16.23 TCP 74 2049 → 1005 [SYN, ACK]
> Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1126765132
> TSecr=327954228 WS=512
>   3   0.000269 141.14.16.23 -> 141.14.26.2  TCP 66 1005 → 2049 [ACK] Seq=1
> Ack=1 Win=29696 Len=0 TSval=327954229 TSecr=1126765132
>   4   0.000300 141.14.16.23 -> 141.14.26.2  NFS 110 V4 NULL Call
>   5   0.202652 141.14.16.23 -> 141.14.26.2  TCP 110 [TCP Retransmission]
> 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327954432
> TSecr=1126765132
>   6   0.410651 141.14.16.23 -> 141.14.26.2  TCP 110 [TCP Retransmission]
> 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327954640
> TSecr=1126765132
>   7   0.818649 141.14.16.23 -> 141.14.26.2  TCP 110 [TCP Retransmission]
> 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327955048
> TSecr=1126765132
>   8   1.012429  141.14.26.2 -> 141.14.16.23 TCP 74 [TCP Spurious
> Retransmission] 2049 → 1005 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460
> SACK_PERM=1 TSval=1126766144 TSecr=327955048 WS=512
>   9   1.012448 141.14.16.23 -> 141.14.26.2  TCP 66 [TCP Dup ACK 3#1] 1005 →
> 2049 [ACK] Seq=45 Ack=1 Win=29696 Len=0 TSval=327955241 TSecr=1126765132
>  10   1.674650 141.14.16.23 -> 141.14.26.2  TCP 110 [TCP Retransmission]
> 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327955904
> TSecr=1126765132
>  11   3.060502  141.14.26.2 -> 141.14.16.23 TCP 74 [TCP Spurious
> Retransmission] 2049 → 1005 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460
> SACK_PERM=1 TSval=1126768193 TSecr=327955904 WS=512
>  12   3.060521 141.14.16.23 -> 141.14.26.2  TCP 66 [TCP Dup ACK 3#2] 1005 →
> 2049 [ACK] Seq=45 Ack=1 Win=29696 Len=0 TSval=327957289 TSecr=1126765132
>  13   3.338653 141.14.16.23 -> 141.14.26.2  TCP 110 [TCP Retransmission]
> 1005 → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=327957568
> TSecr=1126765132
> ```
> 
> ``
> # tshark -r x.x
>   1   0.000000 141.14.16.23 -> 141.14.26.2  TCP 74 690 → 2049 [SYN] Seq=0
> Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=328229577 TSecr=0 WS=512
>   2   0.000039  141.14.26.2 -> 141.14.16.23 TCP 74 2049 → 690 [SYN, ACK]
> Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=1127040480
> TSecr=328229577 WS=512
>   3   0.000155 141.14.16.23 -> 141.14.26.2  TCP 66 690 → 2049 [ACK] Seq=1
> Ack=1 Win=29696 Len=0 TSval=328229577 TSecr=1127040480
>   4   0.000180 141.14.16.23 -> 141.14.26.2  NFS 110 V4 NULL Call
>   5   0.206937 141.14.16.23 -> 141.14.26.2  TCP 110 [TCP Retransmission] 690
> → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=328229784
> TSecr=1127040480
>   6   0.414925 141.14.16.23 -> 141.14.26.2  TCP 110 [TCP Retransmission] 690
> → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=328229992
> TSecr=1127040480
>   7   0.822928 141.14.16.23 -> 141.14.26.2  TCP 110 [TCP Retransmission] 690
> → 2049 [PSH, ACK] Seq=1 Ack=1 Win=29696 Len=44 TSval=328230400
> TSecr=1127040480
>   8   1.056498  141.14.26.2 -> 141.14.16.23 TCP 74 [TCP Spurious
> Retransmission] 2049 → 690 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460
> SACK_PERM=1 TSval=1127041536 TSecr=328230400 WS=512
> ```
> 
> Looking at the stack traces of the nfsd process with
> 
> ```
> for p in $(pgrep 'nfsd$'); do echo $p; cat /proc/$p/stack; done
> ```
> 
> all but one have the trace below.
> 
> 
> ```
> [<ffffffff813eded7>] call_rwsem_down_write_failed+0x17/0x30
> [<ffffffff8133aa1c>] xfs_file_buffered_aio_write+0x5c/0x2d0
> [<ffffffff8133ad2e>] xfs_file_write_iter+0x9e/0x150
> [<ffffffff81196b50>] do_iter_readv_writev+0xa0/0xf0
> [<ffffffff81197d4a>] do_readv_writev+0x18a/0x230
> [<ffffffff8119803c>] vfs_writev+0x3c/0x50
> [<ffffffffa02826b5>] nfsd_vfs_write+0xc5/0x280 [nfsd]
> [<ffffffffa028e48a>] nfsd4_write+0x17a/0x1d0 [nfsd]
> [<ffffffffa02902f9>] nfsd4_proc_compound+0x369/0x5e0 [nfsd]
> [<ffffffffa027e218>] nfsd_dispatch+0xa8/0x180 [nfsd]
> [<ffffffffa01a66af>] svc_process_common+0x3ff/0x580 [sunrpc]
> [<ffffffffa01a696b>] svc_process+0x13b/0x1b0 [sunrpc]
> [<ffffffffa027dc7d>] nfsd+0xed/0x160 [nfsd]
> [<ffffffff8107b0ba>] kthread+0xca/0xe0
> [<ffffffff81a97092>] ret_from_fork+0x22/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
> ```
> 
> The one problematic one has the stack trace below.
> 
> ```
> [<ffffffff811f6f6c>] iomap_write_begin+0x8c/0x120
> [<ffffffff811f758b>] iomap_zero_range_actor+0xeb/0x210     # oder
> iomap_zero_range_actor+0x90/0x210
> [<ffffffff811f77e2>] iomap_apply+0xa2/0x110
> [<ffffffff811f79b8>] iomap_zero_range+0x58/0x80
> [<ffffffff8133a38e>] xfs_zero_eof+0x4e/0xb0
> [<ffffffff8133a58d>] xfs_file_aio_write_checks+0x19d/0x1c0
> [<ffffffff8133aa39>] xfs_file_buffered_aio_write+0x79/0x2d0
> [<ffffffff8133ad2e>] xfs_file_write_iter+0x9e/0x150
> [<ffffffff81196b50>] do_iter_readv_writev+0xa0/0xf0
> [<ffffffff81197d4a>] do_readv_writev+0x18a/0x230
> [<ffffffff8119803c>] vfs_writev+0x3c/0x50
> [<ffffffffa02826b5>] nfsd_vfs_write+0xc5/0x280 [nfsd]
> [<ffffffffa028e48a>] nfsd4_write+0x17a/0x1d0 [nfsd]
> [<ffffffffa02902f9>] nfsd4_proc_compound+0x369/0x5e0 [nfsd]
> [<ffffffffa027e218>] nfsd_dispatch+0xa8/0x180 [nfsd]
> [<ffffffffa01a66af>] svc_process_common+0x3ff/0x580 [sunrpc]
> [<ffffffffa01a696b>] svc_process+0x13b/0x1b0 [sunrpc]
> [<ffffffffa027dc7d>] nfsd+0xed/0x160 [nfsd]
> [<ffffffff8107b0ba>] kthread+0xca/0xe0
> [<ffffffff81a97092>] ret_from_fork+0x22/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
> ```
> 
> Here is the corresponding lines in the source code.
> 
> ```
> info line *0xffffffff811f6f6c Line 124 of "fs/iomap.c" starts at address
> 0xffffffff811f6f6c <iomap_write_begin+140> and ends at 0xffffffff811f6f70
> <iomap_write_begin+144>.
> info line *0xffffffff811f758b Line 345 of "fs/iomap.c" starts at address
> 0xffffffff811f758b <iomap_zero_range_actor+235> and ends at
> 0xffffffff811f758f <iomap_zero_range_actor+239>.
> info line *0xffffffff811f7530 Line 385 of "fs/iomap.c" starts at address
> 0xffffffff811f7530 <iomap_zero_range_actor+144> and ends at
> 0xffffffff811f7538 <iomap_zero_range_actor+152>.
> ```
> 
> Here is the code.
> 
> ```
> […]
> 109 static int
> 110 iomap_write_begin(struct inode *inode, loff_t pos, unsigned len,
> unsigned flags,
> 111                 struct page **pagep, struct iomap *iomap)
> 112 {
> 113         pgoff_t index = pos >> PAGE_SHIFT;
> 114         struct page *page;
> 115         int status = 0;
> 116
> 117         BUG_ON(pos + len > iomap->offset + iomap->length);
> 118
> 119         if (fatal_signal_pending(current))
> 120                 return -EINTR;
> 121
> 122         page = grab_cache_page_write_begin(inode->i_mapping, index,
> flags);
> 123         if (!page)
> 124                 return -ENOMEM;
> 125
> 126         status = __block_write_begin_int(page, pos, len, NULL, iomap);
> 127         if (unlikely(status)) {
> 128                 unlock_page(page);
> 129                 put_page(page);
> 130                 page = NULL;
> 131
> 132                 iomap_write_failed(inode, pos, len);
> 133         }
> 134
> 135         *pagep = page;
> 136         return status;
> 137 }
> […]
> ```
> 
> So it looks like insufficent memory? But the machine has 1 TB of RAM, and I
> didn’t see any suspicious running `htop`. Also it should have returned from
> with that error, shouldn’t it have?
> 
> 
> Kind regards,
> 
> Paul
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux