On Wed, May 11, 2022 at 12:52:35PM +1000, Dave Chinner wrote:
On Wed, May 11, 2022 at 12:16:57PM +1000, Chris Dunlop wrote:
Out of interest, would this work also reduce the time spent mounting in
my case? I.e. would a lot of the work from my recovery mount be punted
off to a background thread?
No. log recovery will punt the remaining inodegc work to background
threads so it might get slightly parallelised, but we have a hard
barrier between completing log recovery work and completing the
mount process at the moment. Hence we wait for inodegc to complete
before log recovery is marked as complete.
In theory we could allow background inodegc to bleed into active
duty once log recovery has processed all the unlinked lists, but
that's a change of behaviour that would require a careful audit of
the last part of the mount path to ensure that it is safe to be
running concurrent background operations whilst complete mount state
updates.
This hasn't been on my radar at all up until now, but I'll have a
think about it next time I look at those bits of recovery. I suspect
that probably won't be far away - I have a set of unlinked inode
list optimisations that rework the log recovery infrastructure near
the top of my current work queue, so I will be in that general
vicinity over the next few weeks...
I'll keep an eye out.
Regardless, the inodegc work is going to be slow on your system no
matter what we do because of the underlying storage layout. What we
need to do is try to remove all the places where stuff can get
blocked on inodegc completion, but that is somewhat complex because
we still need to be able to throttle queue depths in various
situations.
That reminds of a something I've been wondering about for obvious reasons:
for workloads where metadata operations are dominant, do you have any
ponderings on allowing AGs to be put on fast storage whilst the bulk data
is on molasses storage?
Cheers,
Chris