> On Dec 30, 2017, at 1:05 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: > > On Wed, Dec 27, 2017 at 03:40:58PM -0500, Chuck Lever wrote: >> Last week I updated my test server from v4.14 to v4.15-rc4, and began to >> observe intermittent failures in the git regression suite on NFSv4.1. > > I haven't run that before. Should I just > > mount -overs=4.1 server:/fs /mnt/ > cd /mnt/ > git clone git://git.kernel.org/pub/scm/git/git.git > cd git > make test > > ? You'll need to install SVN and CVS on your client as well. The failures seem to occur only in the SVN/CVS related tests. >> I >> was able to reproduce these failures with NFSv4.1 on both TCP and RDMA, >> yet there has not been a reproduction with NFSv3 or NFSv4.0. >> >> The server hardware is a single-socket 4-core system with 32GB of RAM. >> The export is a tmpfs. Networking is 56Gb InfiniBand (or IPoIB). >> >> The git regression suite reports individual test failures in the SVN >> and CVS tests. On occasion, the client mount point freezes, requiring >> that the client be rebooted in order to unstick the mount. >> >> Just before Christmas, I bisected the problem to: > > Thanks for the report! I'll make some time for this next week. What's > your client? I guess one start might be to see if the reproducer can be > simplified e.g. by running just one of the tests from the suite. The failures are intermittent, and occur in a different test each time. You have to wait for the 9000-series scripts, which test SVN/CVS repo operations. To speed up time-to-failure, use "make -jN test" where N is more than a few. My client and server both have multiple real cores. I'm thinking it's the server that matters here (possibly a race condition is introduced by the below commit?). > --b. > >> >> commit 659aefb68eca28ba9aa482a9fc64de107332e256 >> Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >> Date: Fri Nov 3 08:00:13 2017 -0400 >> >> nfsd: Ensure we don't recognise lock stateids after freeing them >> >> In order to deal with lookup races, nfsd4_free_lock_stateid() needs >> to be able to signal to other stateful functions that the lock stateid >> is no longer valid. Right now, nfsd_lock() will check whether or not an >> existing stateid is still hashed, but only in the "new lock" path. >> >> To ensure the stateid invalidation is also recognised by the "existing lock" >> path, and also by a second call to nfsd4_free_lock_stateid() itself, we can >> change the type to NFS4_CLOSED_STID under the stp->st_mutex. >> >> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> >> Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx> >> >> >> Since we're already at v4.15-rc5 I thought it would be best to break the >> holiday moratorium instead of waiting another week to report this. >> >> >> -- >> Chuck Lever >> >> -- Chuck Lever -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html