Re: [PATCH v3 14/20] nfsd: close cached files prior to a REMOVE or RENAME that would replace target

Jeff Layton <jlayton@xxxxxxxxxxxxxxx> · Mon, 31 Aug 2015 12:50:51 -0400

On Fri, 28 Aug 2015 13:58:34 -0400
"J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:

> On Fri, Aug 28, 2015 at 08:19:46AM -0400, Jeff Layton wrote:
> > On Thu, 27 Aug 2015 09:38:06 -0400
> > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> > 
> > > On Wed, Aug 26, 2015 at 06:53:31PM -0400, Jeff Layton wrote:
> > > > On Wed, 26 Aug 2015 16:00:32 -0400
> > > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> > > > 
> > > > > On Thu, Aug 20, 2015 at 07:17:14AM -0400, Jeff Layton wrote:
> > > > > > It's not uncommon for some workloads to do a bunch of I/O to a file and
> > > > > > delete it just afterward. If knfsd has a cached open file however, then
> > > > > > the file may still be open when the dentry is unlinked. If the
> > > > > > underlying filesystem is nfs, then that could trigger it to do a
> > > > > > sillyrename.
> > > > > 
> > > > > Possibly worth noting that situation doesn't currently occur upstream.
> > > > > 
> > > > > (And, another justification worth noting: space used by a file should be
> > > > > deallocated on last unlink or close.  People do sometimes notice if it's
> > > > > not, especially if the file is large.)
> > > > > 
> > > > 
> > > > Good points.
> > > > 
> > > > > > On a REMOVE or RENAME scan the nfsd_file cache for open files that
> > > > > > correspond to the inode, and proactively unhash and put their
> > > > > > references. This should prevent any delete-on-last-close activity from
> > > > > > occurring, solely due to knfsd's open file cache.
> > > > > 
> > > > > Is there anything here to prevent a new cache entry being added after
> > > > > nfsd_file_close_inode and before the file is actually removed?
> > > > > 
> > > > 
> > > > No, nothing -- it's strictly best effort.
> > > 
> > > Unfortunately I think this is something we really want to guarantee.
> > > 
> > 
> > That should be doable.
> > 
> > One question though -- if we hit this race, what's the right way to
> > handle it?
> > 
> > We don't want to return nfserr_stale or anything since we won't know if
> > the inode will really be stale after the remove completes. We could
> > just be removing one link from a multiply-linked inode.
> > 
> > We also don't want to make the caller wait out nfsd_file_acquire, as the
> > file could be open via NFSv4 and those references might not get put for
> > quite some time.
> > 
> > What semantics should we be aiming for?
> 
> Hm.
> 
> The RENAME or REMOVE has to succeed regardless.
> 
> If the other operation is an open or something lookup-like, then it has
> to succeed or see ENOENT.  If it's a filehandle-based operation like
> READ or WRITE then it's got to succeed or get STALE, and if the latter
> then it better be really and truly gone.  Continuing to process a READ
> or WRITE after the REMOVE continues is fine, I think, it just means some
> application on some other v3 client has an open that we're honoring a
> moment longer than we're required to.
> 
> Now I'd have to look back at your code....  If the caching were just an
> optimization, we could just it off temporarily, but I don't think we
> can.
> 
> Marking the file dead somehow and failing further attempts to use it
> might work.  I guess that's equivalent to your suggestion to unhash it
> in this case.
> 

Yeah, though that is a rather substantive change. Right now, we just
tear these things down when we put the last reference. If we have to
prevent new references from being acquired during the duration of the
unlink, then that changes the entire semantics of the cache.

We'll need to allow some sort of object to persist in the cache (to act
as a placeholder), but still allow the struct file to be closed. I'll
have to think about how we can make that work. It'll probably make this
quite a bit more complex...
-- 
Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html