Hi Brian, On 2015-08-01 01:09, Brian Norris wrote: <snip> >> +static inline int vf610_nfc_correct_data(struct mtd_info *mtd, uint8_t *dat) >> +{ >> + struct vf610_nfc *nfc = mtd_to_nfc(mtd); >> + u8 ecc_status; >> + u8 ecc_count; >> + int flip; >> + >> + ecc_status = __raw_readb(nfc->regs + ECC_SRAM_ADDR * 8 + ECC_OFFSET); >> + ecc_count = ecc_status & ECC_ERR_COUNT; >> + if (!(ecc_status & ECC_STATUS_MASK)) >> + return ecc_count; >> + >> + /* >> + * On an erased page, bit count should be zero or at least >> + * less then half of the ECC strength >> + */ >> + flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count); > > Sorry I didn't notice this earlier, but it appears you are falling into > the same trap that almost everyone else is -- it is not sufficient to > check just the page area; you also need to check the OOB. Suppose that > a MTD user wrote mostly-0xff data to the page, then the page accumulates > bitflips in the spare area and a few in the page area, such that > eventually HW ECC can't correct them. If there are few enough zero bits > in the data area, you will mistakenly think that this is a blank page > below, and memset() it to 0xff. That would be disastrous! > > Fortunately, your code is otherwise quite well structured and looks > good. A tip below. > >> + >> + if (flip > ecc_count && flip > (nfc->chip.ecc.strength / 2)) >> + return -1; >> + >> + /* Erased page. */ >> + memset(dat, 0xff, nfc->chip.ecc.size); >> + return 0; >> +} >> + >> +static int vf610_nfc_read_page(struct mtd_info *mtd, struct nand_chip *chip, >> + uint8_t *buf, int oob_required, int page) >> +{ >> + int eccsize = chip->ecc.size; >> + int stat; >> + >> + vf610_nfc_read_buf(mtd, buf, eccsize); >> + >> + if (oob_required) >> + vf610_nfc_read_buf(mtd, chip->oob_poi, mtd->oobsize); > > To fix the bitflips issue above, you'll just want to unconditionally > read the OOB (it's fine to ignore 'oob_required') and... > >> + >> + stat = vf610_nfc_correct_data(mtd, buf); > > ...pass in chip->oob_poi as a third argument. > Hm, this probably will have an effect on performance, since we usually omit the OOB if not requested. I could fetch the OOB from the NAND controllers SRAM only if necessary (if HW ECC status is not ok...). Does this sound reasonable? >> + >> + if (stat < 0) >> + mtd->ecc_stats.failed++; >> + else >> + mtd->ecc_stats.corrected += stat; > > You've got another problem here: ecc.read_page() should be returning > 'max_bitflips' here. So, since you have a single ECC region, this block > should probably be: > > if (stat < 0) { > mtd->ecc_stats.failed++; > return 0; > } else { > mtd->ecc_stats.corrected += stat; > return stat; > } > Ok, will change that. -- Stefan -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html