Re: [PATCH] lightnvm: pblk: extend line wp balance check

Javier González <javier@xxxxxxxxxxx> · Tue, 29 Jan 2019 19:23:36 +0100

> On 29 Jan 2019, at 16.49, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:
> 
> On Tue, Jan 29, 2019 at 4:03 PM Javier González <javier@xxxxxxxxxxx> wrote:
>>> On 29 Jan 2019, at 13.49, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote:
>>> 
>>> On Tue, Jan 29, 2019 at 12:19 PM Javier González <javier@xxxxxxxxxxx> wrote:
>>>>> On 29 Jan 2019, at 09.47, hans@xxxxxxxxxxxxx wrote:
>>>>> 
>>>>> From: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx>
>>>>> 
>>>>> pblk stripes writes of minimal write size across all non-offline chunks
>>>>> in a line, which means that the maximum write pointer delta should not
>>>>> exceed the minimal write size. Extend the line write pointer balance check
>>>>> to cover this case.
>>>>> 
>>>>> Signed-off-by: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx>
>>>>> ---
>>>>> 
>>>>> This patch applies on top of Zhoujie's V3 of
>>>>> "lightnvm: pblk: ignore bad block wp for pblk_line_wp_is_unbalanced
>>>>> 
>>>>> drivers/lightnvm/pblk-recovery.c | 60 ++++++++++++++++++++------------
>>>>> 1 file changed, 37 insertions(+), 23 deletions(-)
>>>>> 
>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c
>>>>> index 02d466e6925e..d86f580036d3 100644
>>>>> --- a/drivers/lightnvm/pblk-recovery.c
>>>>> +++ b/drivers/lightnvm/pblk-recovery.c
>>>>> @@ -302,41 +302,55 @@ static int pblk_pad_distance(struct pblk *pblk, struct pblk_line *line)
>>>>>     return (distance > line->left_msecs) ? line->left_msecs : distance;
>>>>> }
>>>>> 
>>>>> -static int pblk_line_wp_is_unbalanced(struct pblk *pblk,
>>>>> -                                   struct pblk_line *line)
>>>>> +/* Return a chunk belonging to a line by stripe(write order) index */
>>>>> +static struct nvm_chk_meta *pblk_get_stripe_chunk(struct pblk *pblk,
>>>>> +                                               struct pblk_line *line,
>>>>> +                                               int index)
>>>>> {
>>>>>     struct nvm_tgt_dev *dev = pblk->dev;
>>>>>     struct nvm_geo *geo = &dev->geo;
>>>>> -     struct pblk_line_meta *lm = &pblk->lm;
>>>>>     struct pblk_lun *rlun;
>>>>> -     struct nvm_chk_meta *chunk;
>>>>>     struct ppa_addr ppa;
>>>>> -     u64 line_wp;
>>>>> -     int pos, i, bit;
>>>>> +     int pos;
>>>>> 
>>>>> -     bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line);
>>>>> -     if (bit >= lm->blk_per_line)
>>>>> -             return 0;
>>>>> -     rlun = &pblk->luns[bit];
>>>>> +     rlun = &pblk->luns[index];
>>>>>     ppa = rlun->bppa;
>>>>>     pos = pblk_ppa_to_pos(geo, ppa);
>>>>> -     chunk = &line->chks[pos];
>>>>> 
>>>>> -     line_wp = chunk->wp;
>>>>> +     return &line->chks[pos];
>>>>> +}
>>>>> 
>>>>> -     for (i = bit + 1; i < lm->blk_per_line; i++) {
>>>>> -             rlun = &pblk->luns[i];
>>>>> -             ppa = rlun->bppa;
>>>>> -             pos = pblk_ppa_to_pos(geo, ppa);
>>>>> -             chunk = &line->chks[pos];
>>>>> +static int pblk_line_wps_are_unbalanced(struct pblk *pblk,
>>>>> +                                   struct pblk_line *line)
>>>>> +{
>>>>> +     struct pblk_line_meta *lm = &pblk->lm;
>>>>> +     int blk_in_line = lm->blk_per_line;
>>>>> +     struct nvm_chk_meta *chunk;
>>>>> +     u64 max_wp, min_wp;
>>>>> +     int i;
>>>>> 
>>>>> -             if (chunk->state & NVM_CHK_ST_OFFLINE)
>>>>> -                     continue;
>>>>> +     i = find_first_zero_bit(line->blk_bitmap, blk_in_line);
>>>>> +
>>>>> +     /* If there is one or zero good chunks in the line,
>>>>> +      * the write pointers can't be unbalanced.
>>>>> +      */
>>>>> +     if (i >= (blk_in_line - 1))
>>>>> +             return 0;
>>>>> 
>>>>> -             if (chunk->wp > line_wp)
>>>>> +     chunk = pblk_get_stripe_chunk(pblk, line, i);
>>>>> +     max_wp = chunk->wp;
>>>>> +     if (max_wp > pblk->max_write_pgs)
>>>>> +             min_wp = max_wp - pblk->max_write_pgs;
>>>>> +     else
>>>>> +             min_wp = 0;
>>>>> +
>>>>> +     i = find_next_zero_bit(line->blk_bitmap, blk_in_line, i + 1);
>>>>> +     while (i < blk_in_line) {
>>>>> +             chunk = pblk_get_stripe_chunk(pblk, line, i);
>>>>> +             if (chunk->wp > max_wp || chunk->wp < min_wp)
>>>>>                     return 1;
>>>>> -             else if (chunk->wp < line_wp)
>>>>> -                     line_wp = chunk->wp;
>>>>> +
>>>>> +             i = find_next_zero_bit(line->blk_bitmap, blk_in_line, i + 1);
>>>>>     }
>>>>> 
>>>>>     return 0;
>>>>> @@ -362,7 +376,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line,
>>>>>     int ret;
>>>>>     u64 left_ppas = pblk_sec_in_open_line(pblk, line) - lm->smeta_sec;
>>>>> 
>>>>> -     if (pblk_line_wp_is_unbalanced(pblk, line))
>>>>> +     if (pblk_line_wps_are_unbalanced(pblk, line))
>>>>>             pblk_warn(pblk, "recovering unbalanced line (%d)\n", line->id);
>>>>> 
>>>>>     ppa_list = p.ppa_list;
>>>>> --
>>>>> 2.17.1
>>>> 
>>>> If I am understanding correctly, you want to protect against the case
>>>> where a pfail has broken the ws_min partition of a chunk, right? I say
>>>> this because there is a guarantee that the minimal write size and pblk's
>>>> write size align with ws_min and ws_opt. Even if we have grown bad
>>>> blocks on pfail for the current line (which is a bigger problem because
>>>> we have potentially lost data), this guarantee would remain.
>>>> 
>>>> If this is the case, my first reaction would be to say that the
>>>> controller is responsible for guaranteeing atomicity for both scalar and
>>>> vector I/Os. If this is not guaranteed, we have bigger problems than
>>>> this (e.g., for the write error recovery path).
>>>> 
>>>> Are you thinking of a different case?
>>> 
>>> The write pointer check triggers a warning if something unexpected has
>>> happened to the chunks.
>>> i.e. if something else than pblk messed with the disk, or if the user
>>> tries to recover a pblk instance with an invalid lun configuration.
>> 
>> But this will only solve a very specific corner case of this, right?
>> This is, when you are writing at WP on a middle LUN exactly < ws_min.
>>> For example, If the user attempts to start pblk with a different LUN
>> configuration, the alignment is the same an pblk will actually fail
>> because it cannot find any emeta.
> 
> Well, we will try to recover a line if the emeta data cannot be read
> or the data fails the crc check, and the recovery path assumes that
> the write pointers pointers have been incremented in the pblk stripe
> order. We'll get a warning now for all cases if this assumption is not
> valid.
> 
>> For completion, the original wp_unbalanced patch attempted to protect
>> against the case where several outstanding I/Os to different PUs are
>> inflight and then you have a pfail. In this case, a write to a PU that
>> is not "next" in the line bitmap completes and we have a whole, meaning
>> that if we recover this case, we risk to overwrite valid data, as we
>> break the "sequentiality" the line, which allows for recovering by
>> replaying the P2L stored on the OOB.
> 
> For outstanding writes, we can leave no guarantees anyway. If a
> presync was attached to a bio, that bio will complete when the write
> completes.
> 
>>> This patch adds a warning if a chunk wp is too small(i.e. if a chunk
>>> was unexpectedly reset)
>> 
>> I am not arguing against the implementation, I am just trying to
>> understand what you are trying to fix. If it is the case I described
>> originally, then I do not think it is possible on the Open-Channel
>> architecture. If you want to add protection against corruptions, then
>> this needs more work as many corruption cases are missing - you would
>> need something like a OOB watermark to protect open lines and something
>> like a OOB lba CRC check in emeta to validate that no data has been
>> altered.
>> 
>> If you ask me, I do not think the latter belongs to pblk, and if it did,
>> I would suggest a whole new (optional) feature that adds this short of
>> integrity protection, ideally, reusing NVMe PI. In the original pblk, we
>> have followed the same assumption as block devices do; you can go an
>> write on the side to a block device that has a FS on top; the block
>> device will not complain at all - if the FS detects this then they will
>> try to fix it, otherwise you lost data.
>> 
>> In this context, we have discussed about a pblk tool a la fsck, which can
>> cover all this cases instead of adding more complexity to the recovery
>> path in the kernel. Here, you patch makes sense we fail if something
>> suspicious has occurred and move the burden to the pblk tool for it to
>> do the repair. But again, if the goal s adding integrity protection, it
>> needs to cover the rest of the cases.
>> 
>> Hope this makes sense.
> 
> It does! I'm not trying to do anything more than warn if the write
> pointers are not what we expect them to be.
> 
> You're right, we need some sort of superblock and some recovery tool
> to make pblk more robust against corruptions.
> Now, we get a warning at least :)

Sounds good! Now I understand.

It is good for me to apply the patch then.

Reviewed-by: Javier González <javier@xxxxxxxxxxx>

Attachment:
signature.asc

Description: Message signed with OpenPGP