Re: [PATCH v3 14/20] nfsd: close cached files prior to a REMOVE or RENAME that would replace target

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Fri, 28 Aug 2015 13:58:34 -0400

On Fri, Aug 28, 2015 at 08:19:46AM -0400, Jeff Layton wrote:
> On Thu, 27 Aug 2015 09:38:06 -0400
> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> 
> > On Wed, Aug 26, 2015 at 06:53:31PM -0400, Jeff Layton wrote:
> > > On Wed, 26 Aug 2015 16:00:32 -0400
> > > "J. Bruce Fields" <bfields@xxxxxxxxxxxx> wrote:
> > > 
> > > > On Thu, Aug 20, 2015 at 07:17:14AM -0400, Jeff Layton wrote:
> > > > > It's not uncommon for some workloads to do a bunch of I/O to a file and
> > > > > delete it just afterward. If knfsd has a cached open file however, then
> > > > > the file may still be open when the dentry is unlinked. If the
> > > > > underlying filesystem is nfs, then that could trigger it to do a
> > > > > sillyrename.
> > > > 
> > > > Possibly worth noting that situation doesn't currently occur upstream.
> > > > 
> > > > (And, another justification worth noting: space used by a file should be
> > > > deallocated on last unlink or close.  People do sometimes notice if it's
> > > > not, especially if the file is large.)
> > > > 
> > > 
> > > Good points.
> > > 
> > > > > On a REMOVE or RENAME scan the nfsd_file cache for open files that
> > > > > correspond to the inode, and proactively unhash and put their
> > > > > references. This should prevent any delete-on-last-close activity from
> > > > > occurring, solely due to knfsd's open file cache.
> > > > 
> > > > Is there anything here to prevent a new cache entry being added after
> > > > nfsd_file_close_inode and before the file is actually removed?
> > > > 
> > > 
> > > No, nothing -- it's strictly best effort.
> > 
> > Unfortunately I think this is something we really want to guarantee.
> > 
> 
> That should be doable.
> 
> One question though -- if we hit this race, what's the right way to
> handle it?
> 
> We don't want to return nfserr_stale or anything since we won't know if
> the inode will really be stale after the remove completes. We could
> just be removing one link from a multiply-linked inode.
> 
> We also don't want to make the caller wait out nfsd_file_acquire, as the
> file could be open via NFSv4 and those references might not get put for
> quite some time.
> 
> What semantics should we be aiming for?

Hm.

The RENAME or REMOVE has to succeed regardless.

If the other operation is an open or something lookup-like, then it has
to succeed or see ENOENT.  If it's a filehandle-based operation like
READ or WRITE then it's got to succeed or get STALE, and if the latter
then it better be really and truly gone.  Continuing to process a READ
or WRITE after the REMOVE continues is fine, I think, it just means some
application on some other v3 client has an open that we're honoring a
moment longer than we're required to.

Now I'd have to look back at your code....  If the caching were just an
optimization, we could just it off temporarily, but I don't think we
can.

Marking the file dead somehow and failing further attempts to use it
might work.  I guess that's equivalent to your suggestion to unhash it
in this case.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html