Re: [PATCH/RFC] make nfs_flush_incompatible more forgiving.

NeilBrown <neilb@xxxxxxx> · Thu, 5 Sep 2013 09:29:47 +1000

On Wed, 4 Sep 2013 14:33:35 +0000 "Myklebust, Trond"
<Trond.Myklebust@xxxxxxxxxx> wrote:

> On Wed, 2013-09-04 at 11:36 +1000, NeilBrown wrote:
> > Hi,
> >  we have a customer who has noticed a significant regression in NFS
> >  performance between SLES10 (2.6.16+) and SLES11 (3.0+).
> > 
> >  The use case involves two different processes on the same client mmapping a
> >  file and making changes.
> > 
> >  In 2.6.16, all these changes would be cached on the client until a flush or
> >  close or other normal process flushed the changes out - much the same as if
> >  just one process was accessing the file.
> > 
> >  In 3.0 the file is flushed out every time a different process performs an
> >  update. So if they interleave accesses, you get a flush-to-the-server on
> >  every access.
> > 
> >  This is caused by nfs_flush_incompatible.  It deems access through different
> >  file descriptors to be incompatible.  However I don't think this is
> >  justified.  Accessed using different credentials can reasonably be seen as
> >  incompatible(*), but if two processes with identical credentials open the
> >  same file there there should be not need to differentiate between them.
> > 
> >  The patch below changes the test in nfs_flush_incompatible to just test the
> >  credentials (and the dentry, though that might be superfluous - it seems
> >  safer).
> > 
> > (*) In this specific customer's case the process do currently have slightly
> >  different credentials.  In one, the primary gid is included in the list of
> >  auxiliary gids.  In the other the primary gid is excluded.  This is enough
> >  to make the credentials appear different so this patch doesn't help in that
> >  case.  They are looking at being more consistent in their setgroups usage
> >  but it would help them if we only actually compared the credentials that
> >  were important.  In this case there is no important difference.
> >  Would it make sense to treat a more broad credential as compatible with a
> >  more narrow credential?  So a write from a process with a given
> >  uid,gid,grouplist could ultimately be sent to the server with same uid/gid,
> >  but a subset of the group list?
> >  
> > 
> >  Thoughts?
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index 7816801..b0d4e47 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -860,7 +860,12 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
> >  		if (req == NULL)
> >  			return 0;
> >  		l_ctx = req->wb_lock_context;
> > -		do_flush = req->wb_page != page || req->wb_context != ctx;
> > +		do_flush = req->wb_page != page;
> > +		if (req->wb_context != ctx) {
> > +			do_flush |=
> > +				req->wb_context->dentry != ctx->dentry ||
> > +				req->wb_context->cred != ctx->cred;
> > +		}
> >  		if (l_ctx) {
> >  			do_flush |= l_ctx->lockowner.l_owner != current->files
> >  				|| l_ctx->lockowner.l_pid != current->tgid;
> 
> Hi Neil,
> 
> Won't this cause any write errors to be propagated to the wrong process?

I'm not sure what this means.

For a local filesystem, the only error that isn't synchronous is EIO and an
application can only be sure of getting that if it uses 'fsync'.  An
application can get EIO from fsync even if it didn't write to the page which
is causing the problem.  All apps should be able to see the EIO if they
bother to fsync.

For NFS, EDQUOT and ENOSPC are also asynchronous, and they can also come in
through close().  I think every app should still see every error whether  it
dirtied the page which triggered the error or not.

??

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature