Re: [PATCH 1/3] xfs: EFI recovery needs it's own transaction reservation

Dave Chinner <david@xxxxxxxxxxxxx> · Fri, 11 Sep 2020 07:29:27 +1000

On Thu, Sep 10, 2020 at 09:18:10AM -0400, Brian Foster wrote:
> On Thu, Sep 10, 2020 at 07:44:55AM +1000, Dave Chinner wrote:
> > On Wed, Sep 09, 2020 at 09:31:11AM -0400, Brian Foster wrote:
> > > It looks like extents are only freed when the last
> > > reference is dropped (otherwise we log a refcount intent), which makes
> > > me wonder whether we really need 7 log count units if recovery
> > > encounters an EFI.
> > 
> > I don't know if the numbers are correct, and it really is out of
> > scope for this patch to audit/fix that. I really think we need to
> > map this whole thing out in a diagram at this point because I now
> > suspect that the allocfree log count calculation is not correct,
> > either...
> 
> I agree up to the point where it relates to this specific EFI recovery
> issue. reflink is enabled by default, which means the default EFI
> recovery reservation is going to have 7 logcount units. Is that actually
> enough of a reduction to prevent this same recovery problem on newer
> fs'? I'm wondering if the tr_efi logcount should just be set to 1, for
> example, at least for the short term fix.

I spent yesterday mapping out the whole "free extent" chain to
understand exactly what was necessary, and in discussing this with
Darrick on #xfs I came to the conclusion that we cannot -ever- have
a logcount of more than 1 for an intent recovery of any sort.

That's because the transaction may have rolled enough times to
exhaust the initial grant of unit * logcount, so the only
reservation that the runtime kernel has when it crashs is a single
transaction unit reservation (which gets re-reserved on every roll).

Hence recovery cannot assume that intent that was being processed has more
than a single unit reservation of log space available to be used,
and hence all intent recovery reservations must start with a log
count of 1.

THere are other restrictions we need to deal with, too. multiple
intents may pin the tail of the log, so we can't just process a
single intent chain at a time as that will result in using all the
log space for a single intent chain and reservation deadlocking on
one of the intents we haven't yet started.

Hence the first thing we have to do in recovering intents is take an
active reservation for -all- intents in the log to match the
reservation state at runtime. Only then can we guarantee that
intents that pin the tail of the log will have the reservation space
needed to be able to unpin the log tail.

Further, because we now may have consumed all the available log
reservation space to guarantee forwards progress, the first commit
of the first intent may block trying to regrant space if another
intent also pins the tail of the log. This means we cannot replay
intents in a serial manner as we currently do. Each intent chain and it's
transaction context needs to run in it's own process context so it
can block waiting for other intents to make progress, just like
happens at runtime when the system crashed. IOWs, intent replay
needs to be done with a task context per intent chain (probably via
a workqueue).

AFAICT, this is the only way we can make intent recovery deadlock
free - we have to recreate the complete log reservation state in
memory before we start recovery of a single intent, we can only
assume an intent had a single unit reservation of log space assigned
to it, and intent chain execution needs to run concurrently so
commits can block waiting for log space to become available as other
intents commit and unpin the tail of the log...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx