Re: Regretion on NFS in mainline kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 18 Apr 2012 14:41:45 +0000
"Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> wrote:

> On Wed, 2012-04-18 at 10:15 -0400, Jeff Layton wrote:
> > On Wed, 18 Apr 2012 15:13:13 +0100
> > Luis Henriques <luis.henriques@xxxxxxxxxxxxx> wrote:
> > 
> > > On Wed, Apr 18, 2012 at 02:04:26PM +0000, Myklebust, Trond wrote:
> > > > On Wed, 2012-04-18 at 14:57 +0100, Luis Henriques wrote:
> > > > > Hi Jeff,
> > > > > 
> > > > > On Wed, Apr 18, 2012 at 09:28:22AM -0400, Jeff Layton wrote:
> > > > > > On Wed, 18 Apr 2012 12:26:10 +0100
> > > > > > Luis Henriques <luis.henriques@xxxxxxxxxxxxx> wrote:
> > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > We have a bug reporting a regression in mainline kernel.  Basically, the
> > > > > > > bug reporters are seeing lots of messages:
> > > > > > > 
> > > > > > > [ 48.701213] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > > > > > > [ 48.701990] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > > > > > > [ 53.696076] nfs4_reclaim_open_state: 6440 callbacks suppressed
> > > > > > > 
> > > > > > > This happens when mounting a user's home directory over NFS.
> > > > > > > 
> > > > > > > Is this a known issue being addressed at the moment?  Is there any
> > > > > > > information needed to help debugging the issue?
> > > > > > > 
> > > > > > > The original bug report can be found here:
> > > > > > > 
> > > > > > > http://bugs.launchpad.net/bugs/974664
> > > > > > > 
> > > > > > > And there's also a similar report for Fedora:
> > > > > > > 
> > > > > > > https://bugzilla.redhat.com/show_bug.cgi?id=811138
> > > > > > > 
> > > > > > > Cheers,
> > > > > > 
> > > > > > This code in nfs4_reclaim_open_state() looks wrong to me, but I'm not
> > > > > > that familiar with this code so I could be wrong:
> > > > > > 
> > > > > > -------------------[snip]--------------------
> > > > > >                 status = ops->recover_open(sp, state);
> > > > > >                 if (status >= 0) {
> > > > > >                         status = nfs4_reclaim_locks(state, ops);
> > > > > >                         if (status >= 0) {
> > > > > >                                 spin_lock(&state->state_lock);
> > > > > >                                 list_for_each_entry(lock, &state->lock_states, ls_locks) {
> > > > > >                                         if (!(lock->ls_flags & NFS_LOCK_INITIALIZED))
> > > > > >                                                 pr_warn_ratelimited("NFS: "
> > > > > >                                                         "%s: Lock reclaim "
> > > > > >                                                         "failed!\n", __func__);
> > > > > >                                 }
> > > > > >                                 spin_unlock(&state->state_lock);
> > > > > >                                 nfs4_put_open_state(state);
> > > > > >                                 goto restart;
> > > > > >                         }
> > > > > >                 }
> > > > > > -------------------[snip]--------------------
> > > > > > 
> > > > > > Shouldn't the status check after nfs4_reclaim_locks be reversed?
> > > > > 
> > > > > Thanks a lot for your help.  Could you please take a look at the patch
> > > > > below, just to make sure I understood you're suggestion correctly?  I will
> > > > > prepare a test kernel so that we can check whether it actually solves the
> > > > > problem or not.
> > > > > 
> > > > > Cheers,
> > > > > --
> > > > > Luis
> > > > > 
> > > > > 
> > > > > >From a1348f473c157439ac62f502eb45ca48f95e627f Mon Sep 17 00:00:00 2001
> > > > > From: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>
> > > > > Date: Wed, 18 Apr 2012 14:50:10 +0100
> > > > > Subject: [PATCH 1/1] NFS: Fix status check on nfs4_reclaim_open_state()
> > > > > 
> > > > > There have been several bug reports, with the following messages on the
> > > > > logs:
> > > > > 
> > > > >  [ 48.701213] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > > > >  [ 48.701990] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > > > >  [ 53.696076] nfs4_reclaim_open_state: 6440 callbacks suppressed
> > > > > 
> > > > > This happens, for example, when mounting a user's home directory over NFS.
> > > > > 
> > > > > Thanks to Jeff Layton that identified the cause, this patch fixes an
> > > > > incorrect status check on nfs4_reclaim_open_state().
> > > > > 
> > > > > Signed-off-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>
> > > > > ---
> > > > >  fs/nfs/nfs4state.c |    2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > > > index 07354b7..8b6acec 100644
> > > > > --- a/fs/nfs/nfs4state.c
> > > > > +++ b/fs/nfs/nfs4state.c
> > > > > @@ -1180,7 +1180,7 @@ restart:
> > > > >  		atomic_inc(&state->count);
> > > > >  		spin_unlock(&sp->so_lock);
> > > > >  		status = ops->recover_open(sp, state);
> > > > > -		if (status >= 0) {
> > > > > +		if (status < 0) {
> > > > >  			status = nfs4_reclaim_locks(state, ops);
> > > > >  			if (status >= 0) {
> > > > >  				spin_lock(&state->state_lock);
> > > > 
> > > > Hell no! 
> > > 
> > > Ouch!  Wrong one...
> > > 
> > > From 005918b5eef853fd4d495743fef5a52ae62f825e Mon Sep 17 00:00:00 2001
> > > From: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>
> > > Date: Wed, 18 Apr 2012 15:08:39 +0100
> > > Subject: [PATCH 1/1] NFS: Fix status check on nfs4_reclaim_open_state()
> > > 
> > > There have been several bug reports, with the following messages on the
> > > logs:
> > > 
> > >  [ 48.701213] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > >  [ 48.701990] NFS: nfs4_reclaim_open_state: Lock reclaim failed!
> > >  [ 53.696076] nfs4_reclaim_open_state: 6440 callbacks suppressed
> > > 
> > > This happens, for example, when mounting a user's home directory over NFS.
> > > 
> > > Thanks to Jeff Layton that identified the cause, this patch fixes an
> > > incorrect status check on nfs4_reclaim_open_state().
> > > 
> > > Signed-off-by: Luis Henriques <luis.henriques@xxxxxxxxxxxxx>
> > > ---
> > >  fs/nfs/nfs4state.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > > index 07354b7..023e09f 100644
> > > --- a/fs/nfs/nfs4state.c
> > > +++ b/fs/nfs/nfs4state.c
> > > @@ -1182,7 +1182,7 @@ restart:
> > >  		status = ops->recover_open(sp, state);
> > >  		if (status >= 0) {
> > >  			status = nfs4_reclaim_locks(state, ops);
> > > -			if (status >= 0) {
> > > +			if (status < 0) {
> > >  				spin_lock(&state->state_lock);
> > >  				list_for_each_entry(lock, &state->lock_states, ls_locks) {
> > >  					if (!(lock->ls_flags & NFS_LOCK_INITIALIZED))
> > 
> > Yeah, that looks more reasonable, but again I'm not sure about this
> > either way. This code has been this way for a long time and it's not
> > clear to me why it's only now become a problem if it is wrong.
> 
> Right. Random (and wrong!) changes such as the above won't fix the
> problem. That code is perfectly correct (look at the nfs4_reclaim_locks
> error cases to see why).
> 

Ugh, ok I see and that code is correct even if it's a bit hard to
follow...

We clear the state_flag_bit on the first attempt against that lock so
if it returns 0 (meaning a successful reclaim, we'll skip over it on
the next pass through the loop.

> Have you instead looked into what these applications are doing? Are they
> perhaps opening the file read only, then trying to apply an exclusive
> BSD lock (something which NFSv4 cannot support)?
> 
> IOW: does the problem go away if you mount with 'local_lock=flock'?
> 


I suspect that that is the trigger here. Sadly common among userspace
apps...


-- 
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux