Re: [PATCH/RFC] make nfs_flush_incompatible more forgiving.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 5 Sep 2013 03:17:54 +0000 "Myklebust, Trond"
<Trond.Myklebust@xxxxxxxxxx> wrote:

> On Thu, 2013-09-05 at 09:29 +1000, NeilBrown wrote:
> > On Wed, 4 Sep 2013 14:33:35 +0000 "Myklebust, Trond"
> > <Trond.Myklebust@xxxxxxxxxx> wrote:
> > 
> > > On Wed, 2013-09-04 at 11:36 +1000, NeilBrown wrote:
> > > > Hi,
> > > >  we have a customer who has noticed a significant regression in NFS
> > > >  performance between SLES10 (2.6.16+) and SLES11 (3.0+).
> > > > 
> > > >  The use case involves two different processes on the same client mmapping a
> > > >  file and making changes.
> > > > 
> > > >  In 2.6.16, all these changes would be cached on the client until a flush or
> > > >  close or other normal process flushed the changes out - much the same as if
> > > >  just one process was accessing the file.
> > > > 
> > > >  In 3.0 the file is flushed out every time a different process performs an
> > > >  update. So if they interleave accesses, you get a flush-to-the-server on
> > > >  every access.
> > > > 
> > > >  This is caused by nfs_flush_incompatible.  It deems access through different
> > > >  file descriptors to be incompatible.  However I don't think this is
> > > >  justified.  Accessed using different credentials can reasonably be seen as
> > > >  incompatible(*), but if two processes with identical credentials open the
> > > >  same file there there should be not need to differentiate between them.
> > > > 
> > > >  The patch below changes the test in nfs_flush_incompatible to just test the
> > > >  credentials (and the dentry, though that might be superfluous - it seems
> > > >  safer).
> > > > 
> > > > (*) In this specific customer's case the process do currently have slightly
> > > >  different credentials.  In one, the primary gid is included in the list of
> > > >  auxiliary gids.  In the other the primary gid is excluded.  This is enough
> > > >  to make the credentials appear different so this patch doesn't help in that
> > > >  case.  They are looking at being more consistent in their setgroups usage
> > > >  but it would help them if we only actually compared the credentials that
> > > >  were important.  In this case there is no important difference.
> > > >  Would it make sense to treat a more broad credential as compatible with a
> > > >  more narrow credential?  So a write from a process with a given
> > > >  uid,gid,grouplist could ultimately be sent to the server with same uid/gid,
> > > >  but a subset of the group list?
> > > >  
> > > > 
> > > >  Thoughts?
> > > > 
> > > > Thanks,
> > > > NeilBrown
> > > > 
> > > > 
> > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > > index 7816801..b0d4e47 100644
> > > > --- a/fs/nfs/write.c
> > > > +++ b/fs/nfs/write.c
> > > > @@ -860,7 +860,12 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
> > > >  		if (req == NULL)
> > > >  			return 0;
> > > >  		l_ctx = req->wb_lock_context;
> > > > -		do_flush = req->wb_page != page || req->wb_context != ctx;
> > > > +		do_flush = req->wb_page != page;
> > > > +		if (req->wb_context != ctx) {
> > > > +			do_flush |=
> > > > +				req->wb_context->dentry != ctx->dentry ||
> > > > +				req->wb_context->cred != ctx->cred;
> > > > +		}
> > > >  		if (l_ctx) {
> > > >  			do_flush |= l_ctx->lockowner.l_owner != current->files
> > > >  				|| l_ctx->lockowner.l_pid != current->tgid;
> > > 
> > > Hi Neil,
> > > 
> > > Won't this cause any write errors to be propagated to the wrong process?
> > 
> > I'm not sure what this means.
> > 
> > For a local filesystem, the only error that isn't synchronous is EIO and an
> > application can only be sure of getting that if it uses 'fsync'.  An
> > application can get EIO from fsync even if it didn't write to the page which
> > is causing the problem.  All apps should be able to see the EIO if they
> > bother to fsync.
> 
> You are describing a consequence of the complete and utter brokenness of
> the block layer. The block layer doesn't track who wrote what into the
> page cache, and so it just passes any write errors to whichever process
> that first calls filemap_check_errors() for that file.

I would have said "pragmatic design" rather than "complete ... brokenness".

If a write has failed then the file is corrupt and every writer deserves to
know.
If multiple processes try to fsync the same file, pages that fail for one
process will likely fail for the next process so each should get told of the
failure.

(actually... I wonder if I'm missing something.  If you have a Dirty page
which results in an Error every time it is written, is there any way for
it to be removed from the page cache, or will it remain there causing every
'fsync' on the file to try to write the same page and get an error?)

> 
> In NFS, we use the nfs_open_context to track exactly who wrote what, and
> use that to notify the specific process that wrote the data that failed.

While this could be seen as an improvement, the extra guarantees are only
available on NFS, so apps are not likely to depend on them, so the extra value
doesn't actually help anyone - does it?

And it has a clear cost: two processes on the one machine can suffer
significantly degraded performance.
Is the theoretical improvement in semantics worth that cost.

> 
> > For NFS, EDQUOT and ENOSPC are also asynchronous, and they can also come in
> > through close().  I think every app should still see every error whether  it
> > dirtied the page which triggered the error or not.
> 
> With this patch, we can end up writing the data using the wrong
> nfs_open_context, in which case the errors will get reported to the
> process that actually owns that nfs_open_context.
> 

I see yes.  I would want the error to be returned to whoever instigated the
'flush', not whoever issues the 'write'.

Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux