On Wed, Oct 19, 2016 at 10:28:18PM -0700, Shaohua Li wrote: > On Tue, Oct 18, 2016 at 04:10:24PM +0200, Tomasz Majchrzak wrote: > > Once external metadata handler acknowledges all bad blocks (by writing > > to rdev 'bad_blocks' sysfs file), it requests to unblock the array. > > Check if all bad blocks are actually acknowledged as there might be a > > race if new bad blocks are notified at the same time. If all bad blocks > > are acknowledged, just unblock the array and continue. If not, ignore > > the request to unblock (do not fail an array). External metadata handler > > is expected to either process remaining bad blocks and try to unblock > > again or remove bad block support for a disk (which will cause disk to > > fail as in no-support case). > > > > Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@xxxxxxxxx> > > Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@xxxxxxxxx> > > --- > > drivers/md/md.c | 24 +++++++++++++++++------- > > 1 file changed, 17 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index cc05236..ce585b7 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -2612,19 +2612,29 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len) > > set_bit(Blocked, &rdev->flags); > > err = 0; > > } else if (cmd_match(buf, "-blocked")) { > > - if (!test_bit(Faulty, &rdev->flags) && > > + int unblock = 1; > > + int acked = !rdev->badblocks.unacked_exist; > > + > > + if ((test_bit(ExternalBbl, &rdev->flags) && > > + rdev->badblocks.changed)) > > + acked = check_if_badblocks_acked(&rdev->badblocks); > > + > > + if (test_bit(ExternalBbl, &rdev->flags) && !acked) { > > + unblock = 0; > > + } else if (!test_bit(Faulty, &rdev->flags) && > > I missed one thing in last review. writing to bad_blocks sysfs file already > clears the BlockedBadBlocks bit and wakeup the thread sleeping at blocked_wait, > so the array can continue. Why do we need to fix state_store here? We cannot unblock the rdev until all bad blocks are acknowledged. The problem is mdadm cannot be sure it has stored all bad blocks in the first pass (read of unacknowledged_bad_blocks sysfs file). When bad block is encountered, rdev enters Blocked, Faulty state (unacked_exists is non-zero in state_show). Then mdadm reads the bad block, stores it in metadata and acknowledges it to the kernel. Initially I have tried to call ack_all_badblocks in bb_store or in state_store("-blocked") but there was a race. If other requests (the ones that had started before array got into blocked state) notified bad blocks after sysfs file was read by mdadm but before ack_all_badblocks call, ack_all_badblocks call was also acknowledging bad blocks not stored (and never to be as a result) in metadata. That's why I have introduced a new function check_if_all_badblocks_acked to close this race. I'm not sure if native bad block support is not prone to the similar problem - bad block structure modified between metadata sync and ack_all_badblocks call. As for BlockedBadBlocks flag cleared in bb_store, commit de393cdea66c ("md: make it easier to wait for bad blocks to be acknowledged") explains this flag is only an advisory. All awaiting requests are woken up and check if bad block that interests them is already acknowledged. If so, then can continue, and if not, they set the flag again to check in a while. It is just a useful optimization. Please note that rdev with unacknowledged bad block is reported as Blocked via sysfs state (non-zero unacked_exists), even though the corresponding rdev kernel flag is not set. It is the reason why mdadm calls state_store("-blocked"). I hope it clarifies all your doubts. Tomek -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html