Re: Locking problems with Linux 4.9 with NFSD and `fs/iomap.c`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Brian,


On 05/08/17 15:18, Brian Foster wrote:
cc Christoph, who's more familiar with nfs and wrote the iomap bits.

Thank you.

On Sun, May 07, 2017 at 09:09:49PM +0200, Paul Menzel wrote:

There seems to be a regression in Linux 4.9 compared to 4.4. Maybe you have
an idea.

The system used 4.4.38 without issues, and was updated to 4.9.23 on April
24th. Since Friday, the NFS exports where not accessible anymore. Rebooting
the system into 4.9.24, 4.9.24, and 4.9.25 didn’t change anything, and the
system went into the some lock right away. Booting 4.4.38 fixed the issue
though.

The buffered write path was rewritten with the iomap mechanism around
4.7 or so, so there's a pretty big functionality gap between 4.4 and
4.9.

Here is more information.

NFS doesn’t respond to a null call.

What exactly is a NULL call?

Sorry for not making that clear for non-NFS people. From *NFS Version 3 Protocol Specification* [1]:

Procedure NULL does not do any work. It is made available to
allow server response testing and timing.

Can this be reproduced easily?

Unfortunately, we don’t know how to reproduce it. It seems to happen after heavy input/output operations though.

```
$ sudo nfsstat -s
Server rpc stats:
calls      badcalls   badclnt    badauth    xdrcall
15644232   0          0          0          0

Server nfs v4:
null         compound
1071      0% 15643006 99%

Server nfs v4 operations:
op0-unused op1-unused op2-future access close commit 0 0% 0 0% 0 0% 87846 0% 42798 0% 50658 0% create delegpurge delegreturn getattr getfh link 2805 0% 0 0% 16866 0% 8271204 21% 82924 0% 0 0% lock lockt locku lookup lookup_root nverify 0 0% 0 0% 0 0% 64424 0% 0 0% 0 0% open openattr open_conf open_dgrd putfh putpubfh 53848 0% 0 0% 1081 0% 20 0% 15569041 39% 0 0% putrootfh read readdir readlink remove rename 1072 0% 7187366 18% 2045 0% 73 0% 9116 0% 5836 0% renew restorefh savefh secinfo setattr setcltid 72534 0% 0 0% 5836 0% 0 0% 21817 0% 1854 0% setcltidconf verify write rellockowner bc_ctl bind_conn 1854 0% 0 0% 7794634 19% 0 0% 0 0% 0 0% exchange_id create_ses destroy_ses free_stateid getdirdeleg getdevinfo 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% getdevlist layoutcommit layoutget layoutreturn secinfononam sequence 0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
set_ssv      test_stateid want_deleg   destroy_clid reclaim_comp
0         0% 0         0% 0         0% 0         0% 0         0%
```

Otherwise, it's not clear to me whether you've hit a deadlock or some
kind of livelock. Have you checked syslog for any crash or hung task
messages? Please also provide the hung task output (echo w >
/proc/sysrq-trigger) once you've hit this state. It would be
particularly interesting to see whether the iomap_zero_range() path is
included in that output.

Please see the Linux messages in my reply to Christoph’s message.

It may also be interesting to enable the xfs_zero_eof() tracepoint
(trace-cmd start -e 'xfs:xfs_zero_eof') and see what the last few
entries are from /sys/kernel/debug/tracing/trace_pipe.

I built `trace-cmd`, and did what you asked, but there are no messages.

```
$ sudo strace ~/src/trace-cmd/trace-cmd start -e 'xfs:xfs_zero_eof'
$ sudo cat /sys/kernel/tracing/events/xfs/xfs_zero_eof/enable
1
$ sudo cat /sys/kernel/debug/tracing/tracing_on
1
$ sudo cat /sys/kernel/debug/tracing/trace_pipe

```


Kind regards,

Paul


[1] https://www.ietf.org/rfc/rfc1813.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux