Re: NFSv4.1 regression with v4.15-rc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Dec 30, 2017, at 1:05 PM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> 
> On Wed, Dec 27, 2017 at 03:40:58PM -0500, Chuck Lever wrote:
>> Last week I updated my test server from v4.14 to v4.15-rc4, and began to
>> observe intermittent failures in the git regression suite on NFSv4.1.
> 
> I haven't run that before.  Should I just
> 
> 	mount -overs=4.1 server:/fs /mnt/
> 	cd /mnt/
> 	git clone git://git.kernel.org/pub/scm/git/git.git
> 	cd git
> 	make test
> 
> ?

You'll need to install SVN and CVS on your client as well.
The failures seem to occur only in the SVN/CVS related
tests.


>> I
>> was able to reproduce these failures with NFSv4.1 on both TCP and RDMA,
>> yet there has not been a reproduction with NFSv3 or NFSv4.0.
>> 
>> The server hardware is a single-socket 4-core system with 32GB of RAM.
>> The export is a tmpfs. Networking is 56Gb InfiniBand (or IPoIB).
>> 
>> The git regression suite reports individual test failures in the SVN
>> and CVS tests. On occasion, the client mount point freezes, requiring
>> that the client be rebooted in order to unstick the mount.
>> 
>> Just before Christmas, I bisected the problem to:
> 
> Thanks for the report!  I'll make some time for this next week.  What's
> your client?  I guess one start might be to see if the reproducer can be
> simplified e.g. by running just one of the tests from the suite.

The failures are intermittent, and occur in a different test
each time. You have to wait for the 9000-series scripts, which
test SVN/CVS repo operations. To speed up time-to-failure, use
"make -jN test" where N is more than a few.

My client and server both have multiple real cores. I'm
thinking it's the server that matters here (possibly a race
condition is introduced by the below commit?).


> --b.
> 
>> 
>> commit 659aefb68eca28ba9aa482a9fc64de107332e256
>> Author: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>> Date:   Fri Nov 3 08:00:13 2017 -0400
>> 
>>    nfsd: Ensure we don't recognise lock stateids after freeing them
>> 
>>    In order to deal with lookup races, nfsd4_free_lock_stateid() needs
>>    to be able to signal to other stateful functions that the lock stateid
>>    is no longer valid. Right now, nfsd_lock() will check whether or not an
>>    existing stateid is still hashed, but only in the "new lock" path.
>> 
>>    To ensure the stateid invalidation is also recognised by the "existing lock"
>>    path, and also by a second call to nfsd4_free_lock_stateid() itself, we can
>>    change the type to NFS4_CLOSED_STID under the stp->st_mutex.
>> 
>>    Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
>>    Signed-off-by: J. Bruce Fields <bfields@xxxxxxxxxx>
>> 
>> 
>> Since we're already at v4.15-rc5 I thought it would be best to break the
>> holiday moratorium instead of waiting another week to report this.
>> 
>> 
>> --
>> Chuck Lever
>> 
>> 

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux