raid5 Journal Recovery Bug

Logan Gunthorpe <logang@xxxxxxxxxxxx> · Fri, 19 Aug 2022 16:52:18 -0600

Hi Song,

I'm wondering if you can help shed some light on a bug I'm trying to
track down.

We're hitting the BUG_ON in handle_parity_checks5() that tests to ensure
R5_UPTODATE is set for a failed disk in a stripe[1].

We hit this in our test suite somewhat rarely when the journal is
enabled doing device removal and recovery tests. We've concocted a test
that can hit it in under ten minutes.

After some debugging I've found that the stripe that hits the BUG_ON is
hitting a conditional in handle_stripe_fill() for stripes that are in
the journal with a failed disk[2]. This check was added in 2017 by your
patch:

   07e83364845e ("md/r5cache: shift complex rmw from read path to write
path")

A stripe that hits the bug has one injournal dev, and one failed dev and
does not have STRIPE_R5C_CACHING set and therefore hits the conditional
and returns from handle_stripe_fill() without calling fetch_block() or
doing anything else to change the flow of execution. Normally,
fetch_block() would set STRIPE_COMPUTE_RUN to recompute the missing
disk, however that gets skipped for this case. After returning from
handle_stripe_fill(), handle_stripe() will then call
handle_parity_checks5() because STRIPE_COMPUTE_RUN was not set and this
will immediately hit the BUG_ON, because nothing has computed the disk
and set it UPTODATE yet.

I can't say I fully understand the patch that added this, so I don't
really understand why that conditional is there or what it's trying to
accomplish and thus I don't know what the correct solution might be.

Any thoughts?

Thanks,

Logan

[1]
https://elixir.bootlin.com/linux/v6.0-rc1/source/drivers/md/raid5.c#L4381
[2]
https://elixir.bootlin.com/linux/v6.0-rc1/source/drivers/md/raid5.c#L4050