Re: [HELP] md/raid5: do we really avoid reading from known bad blocks ?

"jianchao.wang" <jianchao.w.wang@xxxxxxxxxx> · Thu, 11 Apr 2019 09:03:11 +0800

Hi Song

On 4/11/19 6:23 AM, Song Liu wrote:
> On Tue, Apr 9, 2019 at 6:39 PM jianchao.wang <jianchao.w.wang@xxxxxxxxxx> wrote:
>>
>> Hi Song
>>
>> Thanks so much for you kindly response.
>>
>> On 4/10/19 1:40 AM, Song Liu wrote:
>>> On Tue, Apr 9, 2019 at 2:55 AM jianchao.wang <jianchao.w.wang@xxxxxxxxxx> wrote:
>>>>
>>>> Hi all
>>>>
>>>> In analyse_stripe
>>>>
>>>>                 is_bad = is_badblock(rdev, sh->sector, STRIPE_SECTORS,
>>>>                                              &first_bad, &bad_sectors);
>>>>
>>>>                 clear_bit(R5_Insync, &dev->flags);
>>>>                 if (!rdev)
>>>>                         /* Not in-sync */;
>>>>                 else if (is_bad) {
>>>>                         /* also not in-sync */
>>>>                         if (!test_bit(WriteErrorSeen, &rdev->flags) &&
>>>>                             test_bit(R5_UPTODATE, &dev->flags)) {
>>>>                                 /* treat as in-sync, but with a read error
>>>>                                  * which we can now try to correct
>>>>                                  */
>>>>                                 set_bit(R5_Insync, &dev->flags);
>>>>                                 set_bit(R5_ReadError, &dev->flags);
>>>>                         }
>>>>                 }
>>>> If there is any bad block in the range of a read, it will fall from cache-bypass
>>>> to stripe-cache. With the R5_UPTODATE checking here, we may try to read from this
>>>> badblock instead of attempting to calculate the data directly.
>>>>
>>>> Why do we need the R5_UPTODATE here ?
>>>>
>>>> From the comment of the commit,
>>>> "
>>>> We can only treat a known-bad-block like a read-error if
>>>> we have the data that belongs in that block
>>>> "
>>>
>>> When there is read error, we would like to write the data again, so the drive
>>> can remap the broken sector. This is only possible when we have the data
>>> up to date (calculated from other data block and parity). If the data is not up
>>> to date, we will need to initiate repair (read other blocks for the stripe, and
>>> recover the data). Once this block is up to date, we will go ahead fix the
>>> read error on the next iteration.
>>
>> If we don't set R5_ReadError here, how to trigger the repair to recalculate the data from
>> other data and parity in the stripe ?
>>
>> Try to read the badblock and set this R5_ReadError after read failure in raid5_end_read_request ?
>> It seems meaningless.
>>
>> Is this should be that trigger a repair directly if the block is known bad block ?
>>
>> In commit 31c176ecdf3563140e6395249eda51a18130d9f6, say we should avoid read from known badblocks.
>>
>> Thanks
>> Jianchao
> 
> In this path, R5_Insync is not set, so it will trigger the following logic:
> 
>                 if (!test_bit(R5_Insync, &dev->flags)) {
>                         if (s->failed < 2)
>                                 s->failed_num[s->failed] = i;
>                         s->failed++;
>                         if (rdev && !test_bit(Faulty, &rdev->flags))
>                                 do_recovery = 1;
>                         else if (!rdev) {
>                                 rdev = rcu_dereference(
>                                     conf->disks[i].replacement);
>                                 if (rdev && !test_bit(Faulty, &rdev->flags))
>                                         do_recovery = 1;
>                         }
>                 }
> 
> Then s->failed > 0 and/or do_recovery = 1 will trick next steps of the repair.
> 

Thanks so much for your detailed explanation.
I make sense of it.

Thanks
Jianchao