Re: [PATCH 26/43] xfs: implement zoned garbage collection

Christoph Hellwig <hch@xxxxxx> · Tue, 17 Dec 2024 05:06:55 +0100

On Mon, Dec 16, 2024 at 05:27:53PM -0800, Darrick J. Wong wrote:
> > lot more work to move them and generates more metadata vs moving unshared
> > blocks.  That being said it at least handles reflinks, which this currently
> > doesn't.  I'll take a look at it for ideas on implementing shared block
> > support for the GC code.
> 
> Hrmm.  For defragmenting free space, I thought it was best to move the
> most highly shared extents first to increase the likelihood that the new
> space allocation would be contiguous and not contribute to bmbt
> expansion.

How does moving a highly shared extent vs a less shared extent help
with keeping free space contiguous?  What matters for that in a non-zoned
interface is that the extent is between two free space or soon to be
free space extents, but the amount of sharing shouldn't really matter.

> For zone gc we have to clear out the whole rtgroup and we don't have a
> /lot/ of control so maybe that matters less.  OTOH we know how much
> space we can get out of the zone, so

But yes, independent of the above question, freespace for the zone
allocator is always very contiguous.

> <nod> I'd definitely give the in-kernel gc a means to stop the userspace
> gc if the zone runs out of space and it clearly isn't making progress.
> The tricky part is how do we give the userspace gc one of the "gc
> zones"?

Yes.  And how do we kill it when it doesn't act in time?  How do we
even ensure it acts in time.  How do we deal with userspace GC not
running or getting killed?

I have to say all my experiments with user space call ups for activity
triggered by kernel fast path and memory reclaim activity have been
overwhelmingly negative.  I won't NAK any of someone wants to experiment,
but I don't plan to spend my time on it.

> Ah, right!  Would you mind putting that in a comment somewhere?

Will do.

> > 1 device XFS configurations we'll hit a metadata write error sooner
> > or later and shut the file system down, but with an external RT device
> > we don't and basically never shut down which is rather problematic.
> > So I'm tempted to add code to (at least optionally) shut down after
> > data write errors.
> 
> It would be kinda nice if we could report write(back) errors via
> fanotify, but that's buried so deep in the filesystems that seems
> tricky.

Reporting that is more useful than just the shutdown would be useful.
How we get it on the other hand might be a bit hard.