Re: md: raid5 resync corrects read errors on data block - is this correct?

Alexander Lyakas <alex.bolshoy@xxxxxxxxx> · Tue, 25 Sep 2012 09:50:59 +0200

Yes, Neil, please change it then to "Suggested-By".

Thanks!
Alex.

On Tue, Sep 25, 2012 at 8:57 AM, NeilBrown <neilb@xxxxxxx> wrote:
> On Thu, 20 Sep 2012 11:26:50 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
> wrote:
>
>> Hi Neil,
>> you are completely right. I got confused between mddev->recovery_cp
>> and sb->resync_offset; the latter may become 0 due to in-flight WRITEs
>> and not due to resync. Looking at the code again, I see that
>> recovery_cp is totally one-way from sb->resync_offset to MaxSector
>> (except for explicit loading via sysfs). Also recovery_cp is not
>> relevant to "check" and "repair". So recovery_cp is pretty simple
>> after all.
>>
>> Below is V2 patch. (I have also to credit it to somebody else, because
>> he was the one that said - just do rcw while you are resyncing).
>>
>> Thanks,
>> Alex.
>>
>>
>> -----------------
>> >From cc3e2bfcf2fd2c69180577949425d69de88706bb Mon Sep 17 00:00:00 2001
>> From: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
>> Date: Thu, 13 Sep 2012 18:55:00 +0300
>> Subject: [PATCH] When RAID5 is dirty, force reconstruct-write instead of
>>  read-modify-write.
>>
>> Signed-off-by: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
>> Signed-off-by: Yair Hershko <yair@xxxxxxxxxxxxxxxxx>
>
> Signed-off-by has a very specific meaning - it isn't just a way of giving
> recredit.
> If Yair wrote some of the code, this is fine.
> If not, then something like "Suggest-by:" might be more appropriate.
> Should I change it to that.
>
> applied, thanks.
>
> NeilBrown
>
>
>>
>> diff --git a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
>> b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
>> index 5332202..9fdd5e3 100644
>> --- a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
>> +++ b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
>> @@ -2555,12 +2555,24 @@ static void handle_stripe_dirtying(struct r5conf *conf,
>>                                    int disks)
>>  {
>>         int rmw = 0, rcw = 0, i;
>> -       if (conf->max_degraded == 2) {
>> -               /* RAID6 requires 'rcw' in current implementation
>> -                * Calculate the real rcw later - for now fake it
>> +       sector_t recovery_cp = conf->mddev->recovery_cp;
>> +
>> +       /* RAID6 requires 'rcw' in current implementation.
>> +        * Otherwise, check whether resync is now happening or should start.
>> +        * If yes, then the array is dirty (after unclean shutdown or
>> +        * initial creation), so parity in some stripes might be inconsistent.
>> +        * In this case, we need to always do reconstruct-write, to ensure
>> +        * that in case of drive failure or read-error correction, we
>> +        * generate correct data from the parity.
>> +        */
>> +       if (conf->max_degraded == 2 ||
>> +           (recovery_cp < MaxSector && sh->sector >= recovery_cp)) {
>> +               /* Calculate the real rcw later - for now make it
>>                  * look like rcw is cheaper
>>                  */
>>                 rcw = 1; rmw = 2;
>> +               pr_debug("force RCW max_degraded=%u, recovery_cp=%lu
>> sh->sector=%lu\n",
>> +                        conf->max_degraded, recovery_cp, sh->sector);
>>         } else for (i = disks; i--; ) {
>>                 /* would I have to read this buffer for read_modify_write */
>>                 struct r5dev *dev = &sh->dev[i];
>>
>>
>>
>>
>>
>>
>> On Wed, Sep 19, 2012 at 8:59 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> > On Mon, 17 Sep 2012 14:15:16 +0300 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
>> > wrote:
>> >
>> >> Hi Neil,
>> >> below is a bit less-ugly version of the patch.
>> >> Thanks,
>> >> Alex.
>> >>
>> >> >From 05cf800d623bf558c99d542cf8bf083c85b7e5d5 Mon Sep 17 00:00:00 2001
>> >> From: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
>> >> Date: Thu, 13 Sep 2012 18:55:00 +0300
>> >> Subject: [PATCH] When RAID5 is dirty, force reconstruct-write instead of
>> >>  read-modify-write.
>> >>
>> >> Signed-off-by: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>
>> >> Signed-off-by: Yair Hershko <yair@xxxxxxxxxxxxxxxxx>
>> >>
>> >> diff --git a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
>> >> b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
>> >> index 5332202..0702785 100644
>> >> --- a/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
>> >> +++ b/ubuntu_kmodules/Ubuntu-3.2.0-25.40/drivers/md/raid5.c
>> >> @@ -2555,12 +2555,36 @@ static void handle_stripe_dirtying(struct r5conf *conf,
>> >>                                    int disks)
>> >>  {
>> >>         int rmw = 0, rcw = 0, i;
>> >> -       if (conf->max_degraded == 2) {
>> >> -               /* RAID6 requires 'rcw' in current implementation
>> >> -                * Calculate the real rcw later - for now fake it
>> >> +       sector_t recovery_cp = conf->mddev->recovery_cp;
>> >> +       unsigned long recovery = conf->mddev->recovery;
>> >> +       int needed = test_bit(MD_RECOVERY_NEEDED, &recovery);
>> >> +       int resyncing = test_bit(MD_RECOVERY_SYNC, &recovery) &&
>> >> +                       !test_bit(MD_RECOVERY_REQUESTED, &recovery) &&
>> >> +                       !test_bit(MD_RECOVERY_CHECK, &recovery);
>> >> +       int transitional = test_bit(MD_RECOVERY_RUNNING, &recovery) &&
>> >> +                          !test_bit(MD_RECOVERY_SYNC, &recovery) &&
>> >> +                          !test_bit(MD_RECOVERY_RECOVER, &recovery) &&
>> >> +                          !test_bit(MD_RECOVERY_DONE, &recovery) &&
>> >> +                          !test_bit(MD_RECOVERY_RESHAPE, &recovery);
>> >
>> > Thanks Alex,
>> >  however I don't understand why you want to test all of these bits.
>> > Isn't it enough just to check ->recovery_cp ??
>> >
>> >> +
>> >> +       /* RAID6 requires 'rcw' in current implementation.
>> >> +        * Otherwise, attempt to check whether resync is now happening
>> >> +        * or should start.
>> >> +         * If yes, then the array is dirty (after unclean shutdown or
>> >> +         * initial creation), so parity in some stripes might be inconsistent.
>> >> +         * In this case, we need to always do reconstruct-write, to ensure
>> >> +         * that in case of drive failure or read-error correction, we
>> >> +         * generate correct data from the parity.
>> >> +         */
>> >> +       if (conf->max_degraded == 2 ||
>> >> +           (recovery_cp < MaxSector && sh->sector >= recovery_cp &&
>> >> +            (needed || resyncing || transitional))) {
>> >> +               /* Calculate the real rcw later - for now fake it
>> >>                  * look like rcw is cheaper
>> >
>> > Also, we should probably fix this comment.  s/fake/make/
>> >
>> > Thanks,
>> > NeilBrown
>> >
>> >
>> >
>> >>                  */
>> >>                 rcw = 1; rmw = 2;
>> >> +               pr_debug("force RCW max_degraded=%u, recovery_cp=%lu
>> >> sh->sector=%lu recovery=0x%lx\n",
>> >> +                        conf->max_degraded, recovery_cp, sh->sector, recovery);
>> >>         } else for (i = disks; i--; ) {
>> >>                 /* would I have to read this buffer for read_modify_write */
>> >>                 struct r5dev *dev = &sh->dev[i];
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html