Re: raid1 repair: sync_request() aborts if one of the drives has bad block recorded

Alexander Lyakas <alex.bolshoy@xxxxxxxxx> · Mon, 16 Jul 2012 11:45:12 +0300

Hi Neil,
thanks for your comments.

> I don't see that MD_RECOVERY_REQUESTED is really an issue is it?
> We the wrong thing happen in any case where there just one device with no bad
> block, and at least one device with a bad block.  In that case we want to
> skip forward by the number of bad blocks, not skip to the end of the array.

I think that MD_RECOVERY_REQUESTED has some meaning here. Because
there are only two places where we increase "write_targets":
Here:
		} else if (!test_bit(In_sync, &rdev->flags)) {
			bio->bi_rw = WRITE;
			bio->bi_end_io = end_sync_write;
			write_targets ++;
and
	if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && read_targets > 0)
		/* extra read targets are also write targets */
		write_targets += read_targets-1;

So if we'll see a device that is not In_sync, we will increase
write_targets and won't fall into that issue. That's why I was
thinking to check for MD_RECOVERY_REQUESTED.

>
> So this?
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 240ff31..c443f93 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2422,7 +2422,10 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
>                 /* There is nowhere to write, so all non-sync
>                  * drives must be failed - so we are finished
>                  */
> -               sector_t rv = max_sector - sector_nr;
> +               sector_t rv;
> +               if (min_bad > 0)
> +                       max_sector = sector_nr + min_bad;
> +               rv = max_sector - sector_nr;
>                 *skipped = 1;
>                 put_buf(r1_bio);
>                 return rv;
>

I tested it and it works. However, I have a question: shouldn't we try
to clear bad blocks during "repair" in particular or during any
kind-of-sync in general?

Because currently the following is happening:
- In sync_request devices are selected as candidates for READs and for
WRITEs (using various criteria)
- Then we issue the READs and eventually end up in sync_request_write()
- Here we schedule WRITEs (without consulting bad blocks or WriteErrorSeen bit)
- In end_sync_write, we check if we can clear some bad blocks, and if
yes, eventually we end up in handle_sync_write_finished(), where we
clear bad blocks

So getting back to sync_request(), the issue is this: if we consider a
device for READs, we check for badblocks. If we find a badblock, we
skip it and don't consider this device neither for READs nor for
WRITEs. How about:
- We consider a device for READs
- If it has a bad block, we consider it for WRITEs instead (so we have
a 3rd place where we consider the device for WRITE).
- We may consult WriteErrorSeen bit as we do in normal make_request(),
but not sure, because right now on sync path, we don't consult it

So this patch:

--- c:/work/code_backups/psp13/raid1.c  Sun Jul 15 16:39:07 2012
+++ ubuntu-precise/source/drivers/md/raid1.c    Mon Jul 16 11:35:40 2012
@@ -2385,20 +2385,25 @@
                                        if (wonly < 0)
                                                wonly = i;
                                } else {
                                        if (disk < 0)
                                                disk = i;
                                }
                                bio->bi_rw = READ;
                                bio->bi_end_io = end_sync_read;
                                read_targets++;
                        }
+                       else {
+                               bio->bi_rw = WRITE;
+                               bio->bi_end_io = end_sync_write;
+                               write_targets ++;
+                       }
                }
                if (bio->bi_end_io) {
                        atomic_inc(&rdev->nr_pending);
                        bio->bi_sector = sector_nr + rdev->data_offset;
                        bio->bi_bdev = rdev->bdev;
                        bio->bi_private = r1_bio;
                }
        }
        rcu_read_unlock();
        if (disk < 0)


I tested this patch (together with yours) and it cleared the bad
blocks, but not sure what else I might be missing. My Linux kernel
programming skills are still ... questionable.

Thanks,
Alex.



>
> Thanks,
> NeilBrown
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html