Re: [PATCH 9/9] xfs: cache unlinked pointers in an rhashtable

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Fri, 8 Feb 2019 08:54:14 -0800

On Fri, Feb 08, 2019 at 07:06:24AM -0500, Brian Foster wrote:
> On Fri, Feb 08, 2019 at 12:00:16AM -0800, Christoph Hellwig wrote:
> > > > +	if (XFS_TEST_ERROR(false, pag->pag_mount, XFS_ERRTAG_IUNLINK_FALLBACK))
> > > > +		return 0;
> > > > +
> > > > +	iu = kmem_zalloc(sizeof(*iu), KM_SLEEP | KM_NOFS);
> > > > +	iu->iu_agino = prev_agino;
> > > > +	iu->iu_next_unlinked = this_agino;
> > > > +
> > > > +	error = rhashtable_insert_fast(&pag->pagi_unlinked_hash,
> > > > +			&iu->iu_rhash_head, xfs_iunlink_hash_params);
> > > > +	if (error == 0 || error == -EEXIST)
> > > > +		return error;
> > > > +	return 0;
> > > 
> > > The return val looks funny at a glance: return -EEXIST or otherwise
> > > return 0 for success and all other errors (also the error == 0 looks
> > > superfluous). I see the comment above describes what this does, but it
> > > doesn't explain why. I'd move the comment to above the if check and
> > > explain why it's there (i.e., "Absorb all runtime errors besides -EEXIST
> > > because fallback blah blah. We care about -EEXIST because ...").
> > > 
> > > Also I assume we need to free the iu object on insert failure,
> > > regardless of whether we ultimately return the error.
> > 
> > But is this really the right thing to do?  Everything but ENOMEM
> > would be a bug in the rhashtable implementation, or the XFS use,
> > wouldn't it?
> > 
> 
> I think anything beyond -ENOMEM would imply a bug somewhere, yes.

AFAICT the only other error we can receive from rhashtable is -E2BIG.
That only happens if we somehow end up with more than 2^31 inodes (or
leave a logic bomb for ourselves by setting max_size, I guess...)

> > So why not drop the return value entirely and do a:
> > 
> > 	error = rhashtable_insert_fast(&pag->pagi_unlinked_hash,
> > 			&iu->iu_rhash_head, xfs_iunlink_hash_params);
> > 	ASSERT(error == 0 || error == -ENOMEM);

But if someone hits -EEXIST on a non-debug filesystem we'll never know
that our code was actually buggy, whereas returning it shuts down the
filesystem and we'll start getting letters.

> It's not clear to me whether you're suggesting we return 0, error or
> nothing at all here. The assert otherwise seems fine to me as I don't
> think we'd ever expect anything outside of success or -ENOMEM.

As I mentioned above, we could theoretically receive -E2BIG due to a
programming bug, or at some point the rhashtable code could start
returning new errors it hasn't before, or we could be misreading the
code too. :)

> That said, I don't see any reason to ever leak an iu if we know it
> didn't make it into the table.

Agreed.

> I could probably go either way on whether we wanted to allow the
> filesystem to continue or not on unexpected insert errors. The
> original comment was just that we probably shouldn't explode on
> "expected" errors like -ENOSPC.

Since we /do/ have a fallback to handle cache misses, I think it's fine
to ignore hashtable insertion failures, though I'll put in a ASSERT if
we see any error other than ENOMEM.

> 
> > > > +	error = rhashtable_remove_fast(&pag->pagi_unlinked_hash,
> > > > +			&iu->iu_rhash_head, xfs_iunlink_hash_params);
> > > > +	if (error)
> > > > +		return error;
> > > 
> > > What's interesting about remove failure is that if we otherwise found an
> > > iu for this inode, removing the inode from the unlinked list leaves a
> > > stale/incorrect iu around in the in-core table. So it's one thing for
> > > the in-core table to be a valid subset of the on-disk structures, but
> > > another thing entirely to have incoherence between the two. It might be
> > > worth pointing out that it's critical to return an error here so we
> > > don't actually remove the inode (whereas the re-add below is less
> > > strict).
> > 
> > Again, remove failure here sounds like a programming bug - so ASSERT
> > and or forced shutdown here.
> 
> Remove failure already unconditionally returns the error (which in the
> inactive path translates to a shutdown). My feedback above was just a
> suggestion to explain why in a comment because the error handling is
> intentionally different between table insert/remove/lookup.

I assume you're ok with the comment that got added in the followup patch
I sent to this thread?

--D

> Brian