Re: [PATCH 3/3 v2] md: unblock array if bad blocks have been acknowledged

Shaohua Li <shli@xxxxxxxxxx> · Thu, 20 Oct 2016 14:55:15 -0700

On Thu, Oct 20, 2016 at 02:09:15PM +0200, Tomasz Majchrzak wrote:
> On Wed, Oct 19, 2016 at 10:28:18PM -0700, Shaohua Li wrote:
> > On Tue, Oct 18, 2016 at 04:10:24PM +0200, Tomasz Majchrzak wrote:
> > > Once external metadata handler acknowledges all bad blocks (by writing
> > > to rdev 'bad_blocks' sysfs file), it requests to unblock the array.
> > > Check if all bad blocks are actually acknowledged as there might be a
> > > race if new bad blocks are notified at the same time. If all bad blocks
> > > are acknowledged, just unblock the array and continue. If not, ignore
> > > the request to unblock (do not fail an array). External metadata handler
> > > is expected to either process remaining bad blocks and try to unblock
> > > again or remove bad block support for a disk (which will cause disk to
> > > fail as in no-support case).
> > > 
> > > Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@xxxxxxxxx>
> > > Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@xxxxxxxxx>
> > > ---
> > >  drivers/md/md.c | 24 +++++++++++++++++-------
> > >  1 file changed, 17 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index cc05236..ce585b7 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -2612,19 +2612,29 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
> > >  		set_bit(Blocked, &rdev->flags);
> > >  		err = 0;
> > >  	} else if (cmd_match(buf, "-blocked")) {
> > > -		if (!test_bit(Faulty, &rdev->flags) &&
> > > +		int unblock = 1;
> > > +		int acked = !rdev->badblocks.unacked_exist;
> > > +
> > > +		if ((test_bit(ExternalBbl, &rdev->flags) &&
> > > +		     rdev->badblocks.changed))
> > > +			acked = check_if_badblocks_acked(&rdev->badblocks);
> > > +
> > > +		if (test_bit(ExternalBbl, &rdev->flags) && !acked) {
> > > +			unblock = 0;
> > > +		} else if (!test_bit(Faulty, &rdev->flags) &&
> > 
> > I missed one thing in last review. writing to bad_blocks sysfs file already
> > clears the BlockedBadBlocks bit and wakeup the thread sleeping at blocked_wait,
> > so the array can continue. Why do we need to fix state_store here?
> 
> We cannot unblock the rdev until all bad blocks are acknowledged. The problem is
> mdadm cannot be sure it has stored all bad blocks in the first pass (read of
> unacknowledged_bad_blocks sysfs file). When bad block is encountered, rdev
> enters Blocked, Faulty state (unacked_exists is non-zero in state_show). Then
> mdadm reads the bad block, stores it in metadata and acknowledges it to the
> kernel. Initially I have tried to call ack_all_badblocks in bb_store or in
> state_store("-blocked") but there was a race. If other requests (the ones that
> had started before array got into blocked state) notified bad blocks after sysfs
> file was read by mdadm but before ack_all_badblocks call, ack_all_badblocks call
> was also acknowledging bad blocks not stored (and never to be as a result) in
> metadata. That's why I have introduced a new function
> check_if_all_badblocks_acked to close this race.
> 
> I'm not sure if native bad block support is not prone to the similar problem -
> bad block structure modified between metadata sync and ack_all_badblocks call.
> 

Yep, we always have the race here. Fortunately we don't need to wait all
badblocks acknowledged, the user of md_wait_for_blocked_rdev will retry. In the
retry, we will check if the badblock is acknowledged.

The native bad block support doesn't have the race. We copy badblocks to a new
page, clear badblocks->changed and then write the new page to disks.
ack_all_badblocks will check the ->changed, and do nothing if it's set. So if
something happens in between, ack_all_badblocks will do nothing.

While the external metadata array hasn't such mechanism to avoid race, I still
thought changing state_store isn't a good idea.

I just sent a patch to fix badblocks_set() and make it clear unacked_exists.
bb_store shouldn't call ack_all_badblocks in your case, but we don't need to.
As long as mdadm uses bb_store to acknowledge the ranges, the array can
continue. And if badblocks_set() can clear unacked_exists, the array will not
be reported as Blocked.

> As for BlockedBadBlocks flag cleared in bb_store, commit de393cdea66c ("md: make
> it easier to wait for bad blocks to be acknowledged") explains this flag is only
> an advisory. All awaiting requests are woken up and check if bad block that
> interests them is already acknowledged. If so, then can continue, and if not,
> they set the flag again to check in a while. It is just a useful optimization.

I think 'advisory' means the driver should retry

> Please note that rdev with unacknowledged bad block is reported as Blocked via
> sysfs state (non-zero unacked_exists), even though the corresponding rdev kernel
> flag is not set. It is the reason why mdadm calls state_store("-blocked").

If badblocks_set() can clear unacked_exists, this isn't required.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html