> On 29 Jan 2019, at 16.49, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote: > > On Tue, Jan 29, 2019 at 4:03 PM Javier González <javier@xxxxxxxxxxx> wrote: >>> On 29 Jan 2019, at 13.49, Hans Holmberg <hans@xxxxxxxxxxxxx> wrote: >>> >>> On Tue, Jan 29, 2019 at 12:19 PM Javier González <javier@xxxxxxxxxxx> wrote: >>>>> On 29 Jan 2019, at 09.47, hans@xxxxxxxxxxxxx wrote: >>>>> >>>>> From: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx> >>>>> >>>>> pblk stripes writes of minimal write size across all non-offline chunks >>>>> in a line, which means that the maximum write pointer delta should not >>>>> exceed the minimal write size. Extend the line write pointer balance check >>>>> to cover this case. >>>>> >>>>> Signed-off-by: Hans Holmberg <hans.holmberg@xxxxxxxxxxxx> >>>>> --- >>>>> >>>>> This patch applies on top of Zhoujie's V3 of >>>>> "lightnvm: pblk: ignore bad block wp for pblk_line_wp_is_unbalanced >>>>> >>>>> drivers/lightnvm/pblk-recovery.c | 60 ++++++++++++++++++++------------ >>>>> 1 file changed, 37 insertions(+), 23 deletions(-) >>>>> >>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c >>>>> index 02d466e6925e..d86f580036d3 100644 >>>>> --- a/drivers/lightnvm/pblk-recovery.c >>>>> +++ b/drivers/lightnvm/pblk-recovery.c >>>>> @@ -302,41 +302,55 @@ static int pblk_pad_distance(struct pblk *pblk, struct pblk_line *line) >>>>> return (distance > line->left_msecs) ? line->left_msecs : distance; >>>>> } >>>>> >>>>> -static int pblk_line_wp_is_unbalanced(struct pblk *pblk, >>>>> - struct pblk_line *line) >>>>> +/* Return a chunk belonging to a line by stripe(write order) index */ >>>>> +static struct nvm_chk_meta *pblk_get_stripe_chunk(struct pblk *pblk, >>>>> + struct pblk_line *line, >>>>> + int index) >>>>> { >>>>> struct nvm_tgt_dev *dev = pblk->dev; >>>>> struct nvm_geo *geo = &dev->geo; >>>>> - struct pblk_line_meta *lm = &pblk->lm; >>>>> struct pblk_lun *rlun; >>>>> - struct nvm_chk_meta *chunk; >>>>> struct ppa_addr ppa; >>>>> - u64 line_wp; >>>>> - int pos, i, bit; >>>>> + int pos; >>>>> >>>>> - bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line); >>>>> - if (bit >= lm->blk_per_line) >>>>> - return 0; >>>>> - rlun = &pblk->luns[bit]; >>>>> + rlun = &pblk->luns[index]; >>>>> ppa = rlun->bppa; >>>>> pos = pblk_ppa_to_pos(geo, ppa); >>>>> - chunk = &line->chks[pos]; >>>>> >>>>> - line_wp = chunk->wp; >>>>> + return &line->chks[pos]; >>>>> +} >>>>> >>>>> - for (i = bit + 1; i < lm->blk_per_line; i++) { >>>>> - rlun = &pblk->luns[i]; >>>>> - ppa = rlun->bppa; >>>>> - pos = pblk_ppa_to_pos(geo, ppa); >>>>> - chunk = &line->chks[pos]; >>>>> +static int pblk_line_wps_are_unbalanced(struct pblk *pblk, >>>>> + struct pblk_line *line) >>>>> +{ >>>>> + struct pblk_line_meta *lm = &pblk->lm; >>>>> + int blk_in_line = lm->blk_per_line; >>>>> + struct nvm_chk_meta *chunk; >>>>> + u64 max_wp, min_wp; >>>>> + int i; >>>>> >>>>> - if (chunk->state & NVM_CHK_ST_OFFLINE) >>>>> - continue; >>>>> + i = find_first_zero_bit(line->blk_bitmap, blk_in_line); >>>>> + >>>>> + /* If there is one or zero good chunks in the line, >>>>> + * the write pointers can't be unbalanced. >>>>> + */ >>>>> + if (i >= (blk_in_line - 1)) >>>>> + return 0; >>>>> >>>>> - if (chunk->wp > line_wp) >>>>> + chunk = pblk_get_stripe_chunk(pblk, line, i); >>>>> + max_wp = chunk->wp; >>>>> + if (max_wp > pblk->max_write_pgs) >>>>> + min_wp = max_wp - pblk->max_write_pgs; >>>>> + else >>>>> + min_wp = 0; >>>>> + >>>>> + i = find_next_zero_bit(line->blk_bitmap, blk_in_line, i + 1); >>>>> + while (i < blk_in_line) { >>>>> + chunk = pblk_get_stripe_chunk(pblk, line, i); >>>>> + if (chunk->wp > max_wp || chunk->wp < min_wp) >>>>> return 1; >>>>> - else if (chunk->wp < line_wp) >>>>> - line_wp = chunk->wp; >>>>> + >>>>> + i = find_next_zero_bit(line->blk_bitmap, blk_in_line, i + 1); >>>>> } >>>>> >>>>> return 0; >>>>> @@ -362,7 +376,7 @@ static int pblk_recov_scan_oob(struct pblk *pblk, struct pblk_line *line, >>>>> int ret; >>>>> u64 left_ppas = pblk_sec_in_open_line(pblk, line) - lm->smeta_sec; >>>>> >>>>> - if (pblk_line_wp_is_unbalanced(pblk, line)) >>>>> + if (pblk_line_wps_are_unbalanced(pblk, line)) >>>>> pblk_warn(pblk, "recovering unbalanced line (%d)\n", line->id); >>>>> >>>>> ppa_list = p.ppa_list; >>>>> -- >>>>> 2.17.1 >>>> >>>> If I am understanding correctly, you want to protect against the case >>>> where a pfail has broken the ws_min partition of a chunk, right? I say >>>> this because there is a guarantee that the minimal write size and pblk's >>>> write size align with ws_min and ws_opt. Even if we have grown bad >>>> blocks on pfail for the current line (which is a bigger problem because >>>> we have potentially lost data), this guarantee would remain. >>>> >>>> If this is the case, my first reaction would be to say that the >>>> controller is responsible for guaranteeing atomicity for both scalar and >>>> vector I/Os. If this is not guaranteed, we have bigger problems than >>>> this (e.g., for the write error recovery path). >>>> >>>> Are you thinking of a different case? >>> >>> The write pointer check triggers a warning if something unexpected has >>> happened to the chunks. >>> i.e. if something else than pblk messed with the disk, or if the user >>> tries to recover a pblk instance with an invalid lun configuration. >> >> But this will only solve a very specific corner case of this, right? >> This is, when you are writing at WP on a middle LUN exactly < ws_min. >>> For example, If the user attempts to start pblk with a different LUN >> configuration, the alignment is the same an pblk will actually fail >> because it cannot find any emeta. > > Well, we will try to recover a line if the emeta data cannot be read > or the data fails the crc check, and the recovery path assumes that > the write pointers pointers have been incremented in the pblk stripe > order. We'll get a warning now for all cases if this assumption is not > valid. > >> For completion, the original wp_unbalanced patch attempted to protect >> against the case where several outstanding I/Os to different PUs are >> inflight and then you have a pfail. In this case, a write to a PU that >> is not "next" in the line bitmap completes and we have a whole, meaning >> that if we recover this case, we risk to overwrite valid data, as we >> break the "sequentiality" the line, which allows for recovering by >> replaying the P2L stored on the OOB. > > For outstanding writes, we can leave no guarantees anyway. If a > presync was attached to a bio, that bio will complete when the write > completes. > >>> This patch adds a warning if a chunk wp is too small(i.e. if a chunk >>> was unexpectedly reset) >> >> I am not arguing against the implementation, I am just trying to >> understand what you are trying to fix. If it is the case I described >> originally, then I do not think it is possible on the Open-Channel >> architecture. If you want to add protection against corruptions, then >> this needs more work as many corruption cases are missing - you would >> need something like a OOB watermark to protect open lines and something >> like a OOB lba CRC check in emeta to validate that no data has been >> altered. >> >> If you ask me, I do not think the latter belongs to pblk, and if it did, >> I would suggest a whole new (optional) feature that adds this short of >> integrity protection, ideally, reusing NVMe PI. In the original pblk, we >> have followed the same assumption as block devices do; you can go an >> write on the side to a block device that has a FS on top; the block >> device will not complain at all - if the FS detects this then they will >> try to fix it, otherwise you lost data. >> >> In this context, we have discussed about a pblk tool a la fsck, which can >> cover all this cases instead of adding more complexity to the recovery >> path in the kernel. Here, you patch makes sense we fail if something >> suspicious has occurred and move the burden to the pblk tool for it to >> do the repair. But again, if the goal s adding integrity protection, it >> needs to cover the rest of the cases. >> >> Hope this makes sense. > > It does! I'm not trying to do anything more than warn if the write > pointers are not what we expect them to be. > > You're right, we need some sort of superblock and some recovery tool > to make pblk more robust against corruptions. > Now, we get a warning at least :) Sounds good! Now I understand. It is good for me to apply the patch then. Reviewed-by: Javier González <javier@xxxxxxxxxxx>
Attachment:
signature.asc
Description: Message signed with OpenPGP