Hi again, while refactoring the code I realized that there is another case we are not considering: if the max and min write pointers across all non-bad chunks are off by more than the write unit, we will not be able to continue writing to the line. I'm adding a check of this in my patch. All the best, Hans On Fri, Jan 25, 2019 at 10:30 PM Javier González <javier@xxxxxxxxxxx> wrote: > > > > On 25 Jan 2019, at 21.20, Hans Holmberg <hans.ml.holmberg@xxxxxxxxxxxxx> wrote: > > > >> On Fri, Jan 25, 2019 at 7:33 PM Javier González <javier@xxxxxxxxxxx> wrote: > >> > >> > >> > >>>> On 25 Jan 2019, at 17.46, Hans Holmberg <hans.ml.holmberg@xxxxxxxxxxxxx> wrote: > >>>> > >>>>> On Fri, Jan 25, 2019 at 3:35 PM Matias Bjørling <mb@xxxxxxxxxxx> wrote: > >>>>> > >>>>>> On 1/25/19 3:21 PM, Hans Holmberg wrote: > >>>>>> On Fri, Jan 25, 2019 at 2:33 PM Javier González <javier@xxxxxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>>> On 25 Jan 2019, at 13.59, Hans Holmberg <hans.ml.holmberg@xxxxxxxxxxxxx> wrote: > >>>>>>> > >>>>>>> On Fri, Jan 25, 2019 at 10:41 AM Javier González <javier@xxxxxxxxxxx> wrote: > >>>>>>>>> On 25 Jan 2019, at 09.47, Hans Holmberg <hans.ml.holmberg@xxxxxxxxxxxxx> wrote: > >>>>>>>>> > >>>>>>>>> On Thu, Jan 24, 2019 at 8:51 PM Zhoujie Wu <zjwu@xxxxxxxxxxx> wrote: > >>>>>>>>>> The write pointer of the bad block could be 0 or undefined, ignore > >>>>>>>>>> the checking of the bad block wp for pblk_line_wp_is_unbalanced to > >>>>>>>>>> avoid fake warning. > >>>>>>>>> > >>>>>>>>> fake -> spurious? > >>>>>>>>> > >>>>>>>>>> Signed-off-by: Zhoujie Wu <zjwu@xxxxxxxxxxx> > >>>>>>>>>> --- > >>>>>>>>>> v3: return in case bit >= lm->blk_per_line. > >>>>>>>>>> v2: changed according to Javier's comments. > >>>>>>>>>> > >>>>>>>>>> drivers/lightnvm/pblk-recovery.c | 12 +++++++++--- > >>>>>>>>>> 1 file changed, 9 insertions(+), 3 deletions(-) > >>>>>>>>>> > >>>>>>>>>> diff --git a/drivers/lightnvm/pblk-recovery.c b/drivers/lightnvm/pblk-recovery.c > >>>>>>>>>> index 6761d2a..02d466e 100644 > >>>>>>>>>> --- a/drivers/lightnvm/pblk-recovery.c > >>>>>>>>>> +++ b/drivers/lightnvm/pblk-recovery.c > >>>>>>>>>> @@ -312,21 +312,27 @@ static int pblk_line_wp_is_unbalanced(struct pblk *pblk, > >>>>>>>>>> struct nvm_chk_meta *chunk; > >>>>>>>>>> struct ppa_addr ppa; > >>>>>>>>>> u64 line_wp; > >>>>>>>>>> - int pos, i; > >>>>>>>>>> + int pos, i, bit; > >>>>>>>>> > >>>>>>>>> We don't need both bit and i, one of them is enough. > >>>>>>>>> > >>>>>>>>>> - rlun = &pblk->luns[0]; > >>>>>>>>>> + bit = find_first_zero_bit(line->blk_bitmap, lm->blk_per_line); > >>>>>>>>>> + if (bit >= lm->blk_per_line) > >>>>>>>>>> + return 0; > >>>>>>>>> > >>>>>>>>> If there is only one non-offline chunk in the line, the wp can't be unbalanced, > >>>>>>>>> so it should be safe to return 0 here if bit >= lm->blk_per_line - 1 > >>>>>>>>> > >>>>>>>>> If you change this please document why using a comment, as it might > >>>>>>>>> not be obvious > >>>>>>>>> > >>>>>>>>>> + rlun = &pblk->luns[bit]; > >>>>>>>>>> ppa = rlun->bppa; > >>>>>>>>>> pos = pblk_ppa_to_pos(geo, ppa); > >>>>>>>>>> chunk = &line->chks[pos]; > >>>>>>>>> > >>>>>>>>>> line_wp = chunk->wp; > >>>>>>>>>> > >>>>>>>>>> - for (i = 1; i < lm->blk_per_line; i++) { > >>>>>>>>>> + for (i = bit + 1; i < lm->blk_per_line; i++) { > >>>>>>>>> > >>>>>>>>>> rlun = &pblk->luns[i]; > >>>>>>>>>> ppa = rlun->bppa; > >>>>>>>>>> pos = pblk_ppa_to_pos(geo, ppa); > >>>>>>>>>> chunk = &line->chks[pos]; > >>>>>>>>> > >>>>>>>>> This code is a copy of the code above, it'd be nice to refactor it > >>>>>>>>> into a helper function or just do the chunk lookups in one place. > >>>>>>>>> > >>>>>>>>>> + if (chunk->state & NVM_CHK_ST_OFFLINE) > >>>>>>>>>> + continue; > >>>>>>>>>> + > >>>>>>>>> > >>>>>>>>> Since we rely on the block bitmap anyway, we might as well just > >>>>>>>>> iterate over the zeroes in the block bitmap using find_next_zero_bit > >>>>>>>>> instead. > >>>>>>>>> We do this in lots of other places, see: git grep -n > >>>>>>>>> find_next_zero_bit -- drivers/lightnvm > >>>>>>>> > >>>>>>>> Hans, I proposed him to use the chunk->state instead. I think it is way > >>>>>>>> more robust. We introduced the block bitmap for OCSSD 1.2, because there > >>>>>>>> was no state. Now that we have state, it is better to use it instead. In > >>>>>>>> fact, we should remove the bock bitmap as we have to check for the state > >>>>>>>> either way - note that this aligns also very well with you patches > >>>>>>>> removing the other bitmaps. > >>>>>>> > >>>>>>> These are just nitpicks. > >>>>>>> > >>>>>>> Relying on two data structures(chunks, block bitmap) to be in sync in > >>>>>>> this function in stead of one does not make it more robust imho. > >>>>>>> Either or (checking chunks or the block bitmap) is fine by me. > >>>>>>> Searching the bitmap is more efficient, so that is what I proposed. > >>>>>> > >>>>>> chunk log page is the ground truth, so it is more robust. > >>>>>> > >>>>>> Also, pblk has a long way to start seeing bitmap search vs. integer > >>>>>> comparisons in profiling. > >>>>> > >>>>> Hehe, yeah, but it does not hurt to use the better alternative when available. > >>>>> > >>>>>>> > >>>>>>> If we want to remove the block bitmap, we can do that as a separate patch(set) > >>>>>>> I do agree, having two copies of the chunk state is something worth > >>>>>>> getting rid of :) > >>>>>>> > >>>>>> > >>>>>> A good start is not adding code using what we want to remove. > >>>>> > >>>>> Well, I think it's very confusing to use both copies in the same function. > >>>>> > >>>>> Now we're nitpicking nitpicks :) > >>>>> > >>>> > >>>> I look forward to a patch. Will one of you volunteer a patch? > >>> > >>> Sure! It'd be easier to Illustrate what I mean with a patch. > >> > >> Cool. Go ahead - you will see how much cleaner it is using the chunk info. > >> > >> Matias : In any case Zhoujie’s fix is orthogonal to this discussion and I think you should pick it up - in one form of bb iteration or another - as it fixes a real issue. > > > > Second that! I feel bad about the fix not making it in just because of > > this squabbling. > > No squabbling :) We both care about pblk and want to get the code right. > > If you give that first pass to that patch, it would be great. We can take it from there. > > Nos enjoy the weekend! > > Javier.