Re: spurious sillyrename after O_DIRECT writes get ENOSPC

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Thu, 14 Dec 2017 13:55:14 -0500

On Thu, Dec 14, 2017 at 11:36:54AM -0500, J. Bruce Fields wrote:
> On Thu, Dec 14, 2017 at 08:08:53AM -0500, Benjamin Coddington wrote:
> > 
> > On 13 Dec 2017, at 12:18, J. Bruce Fields wrote:
> > 
> > > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote:
> > >> Last year Christoph noticed a bug that could result in a file being
> > >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC:
> > >>
> > >> 	http://lkml.kernel.org/r/20160616150146.GA14015@xxxxxxxxxxxxx
> > >>
> > >> It's reproduceable on upstream, over v3 or v4.
> > >>
> > >> I looked into it some more, and it seems to reproduce whenever a write
> > >> system call results in multiple WRITE calls, only some of which receive
> > >> ENOSPC.  I think that's resulting in a leak of the wb_kref on some
> > >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?).
> > >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts.  So
> > >> a "rm" on the client (even after the file is closed) results in a
> > >> sillyrename.
> > >>
> > >> I'll keep looking at this, but the relevant code is pretty opaque to me
> > >> so far.  Any ideas welcomed.
> > >
> > > Actually it looks like a leak of dreq->io_count?  That prevents commits
> > > from being sent (which I'm also seeing in network traces--the succesfull
> > > WRITEs are unstable but never get committed), which means
> > > nfs_direct_commit_complete() is never called, and the reference taken on
> > > wb_kref in the request_commit case of nfs_direct_write_completion is
> > > never put.
> > 
> > This sounds to me like the problem Scott's working - he sent a patch
> > yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the
> > mds".
> > 
> > I think the the rule should be that once we call
> > nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion.
> > However, there are some paths where that is not the case.
> 
> Yes, I wondered about that....
> 
> Unfortunately after some more tests now I was think I was wrong, the
> dreq_get()s and put()s are balanced and the bug is somewhere else--in
> the case of my particular reproducer.  Argh.  But yes I can easily
> believe there could be a leak there.

So actually what happens is if you do a direct io write where some
WRITEs succeed and the one fails, then this:

	if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) {
		dreq->flags = 0;
		dreq->error = hdr->error;
	}

clears the NFS_ODIRECT_DO_COMMIT flag, so nfs_direct_write_complete
never scheduels the commit calls.  It looks like that leaves a bunch of
nfs_pages on some to-be-committed list, so we end up leaking a bunch of
stuff, with the most visible symptom being an unnecessarily sillyrename
on close.

I can just remove that clear of dreq_flags and that fixes the problem,
but I doubt that's correct.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html