Re: [RESEND PATCH V2 2/2] mtd: core: NAND filling block

Boris Brezillon <bbrezillon@xxxxxxxxxx> · Fri, 18 Jan 2019 23:39:43 +0100

Subject prefix should be "mtd: rawnand: " not "mtd: core: "

On Fri, 18 Jan 2019 22:12:04 +0000
"Bean Huo (beanhuo)" <beanhuo@xxxxxxxxxx> wrote:

> On some legacy planar 2D Micron NAND devices when a
> block erase command is issued, occasionally even
> though a block erase operation successfully completes
> and returns a pass status, the flash block may not be
> completely erased. Subsequent operations to this block
> on very rare cases can result in subtle failures or
> corruption. These extremely rare cases should nevertheless
> be considered.
> 
> These rare occurrences have been observed on partially
> written blocks. Partially written blocks are not uncommon
> with UBI/UBIFS.
> 
> To avoid this rare occurrence, we make sure that at least
> 15 pages have been programmed to a block before it is erased.
> In case we find that less than 15 pages have been programmed,
> additional pages are programmed in the block. The observation
> is that additional pages rarely need to be written and most of
> the time UBI/UBIFS erases blocks that contain more programmed
> pages.
> 
> Signed-off-by: Bean Huo <beanhuo@xxxxxxxxxx>
> Reviewed-by: ZOLTAN SZUBBOCSEV <zszubbocsev@xxxxxxxxxx>
> ---
>  drivers/mtd/nand/raw/nand_micron.c | 119 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 119 insertions(+)
> 
> diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
> index b85e1c1..f52e072 100644
> --- a/drivers/mtd/nand/raw/nand_micron.c
> +++ b/drivers/mtd/nand/raw/nand_micron.c
> @@ -541,8 +541,127 @@ static void micron_fixup_onfi_param_page(struct nand_chip *chip,
>  		p->revision = cpu_to_le16(ONFI_VERSION_1_0);
>  }
>  
> +static int check_page_if_emtpy(struct nand_chip *chip, char *data)
> +{
> +	struct mtd_info *mtd = nand_to_mtd(chip);
> +	int ret, i;
> +	void *databuf, *eccbuf;
> +	int max_bitflips;
> +	struct mtd_oob_region oobregion;
> +
> +	mtd_ooblayout_ecc(mtd, 0, &oobregion);
> +	eccbuf = chip->oob_poi + oobregion.offset;
> +	databuf = data;
> +	max_bitflips = 0;
> +
> +	for (i = 0; i < chip->ecc.steps; i++) {
> +		ret = nand_check_erased_ecc_chunk(data,
> +						  chip->ecc.size,
> +						  eccbuf,
> +						  chip->ecc.bytes,
> +						  NULL, 0,
> +						  chip->ecc.strength);
> +		if (ret >= 0)
> +			max_bitflips = max(ret, max_bitflips);
> +		else
> +			return false;
> +
> +		databuf += chip->ecc.size;
> +		eccbuf += chip->ecc.bytes;
> +
> +	}
> +	/*
> +	 * As for the empty/erased page, bitflips number should be zero or
> +	 * at least less than the bitflip_threshold.
> +	 */
> +	if (max_bitflips > mtd->bitflip_threshold)
> +		return false;
> +	else
> +		return true;
> +}
> +
> +#define LAST_CHECKPU_PAGE 13
> +
> +static int before_erase_checkup(struct nand_chip *chip, int page)
> +{
> +	struct mtd_info *mtd = nand_to_mtd(chip);
> +	u8 *data_buf;
> +	int ret;
> +	uint32_t empty_page_mask = 0;
> +
> +	 /* Only for legacy planar 2D parallels NAND. */
> +	if ((mtd->type != MTD_NANDFLASH) || (chip->bits_per_cell != 1))
> +		return 0;

Are you implying that all your SLC chips are impacted by this issue,
really?

> +
> +	data_buf =  kmalloc(mtd->writesize, GFP_KERNEL);
> +	if (!data_buf)
> +		return -ENOMEM;
> +
> +	memset(data_buf, 0xFF, mtd->writesize);
> +	/*
> +	 * pgs[] contains pages need to be checked. We firstly check
> +	 * page13,because, for the most of cases, PEB being erased is
> +	 * not partially programmed. If page13 contains data, we directly
> +	 * return since it is not a partially programmed PEB.

So now you say the minimum amount of written pages is 14 not 15? In
your previous patch, the magic number was 15 (and the commit log
says 15 too). 

> Otherwise,
> +	 * then we check page0. And if page0 is programmed, and page13
> +	 * is not programmed, then we start to check from page11, page9,
> +	 * page7, page5, page3 respectively since the pages of PEB are
> +	 * programmed sequentially.

Looks overly complex for only a small gain. Did you try writing X (X
being 14 in this case) pages all the time? If you did, how does it
compare to this version (perf-wise). I suspect that reading pages
before potentially overwriting them will actually take more time than
blindly overwriting 14 pages with 0x00.

> We olny check odd page.

Why?!

> +	 */
> +
> +	char pgs[] = {13, 0, 11, 9, 7, 5, 3};
> +	int  c = ARRAY_SIZE(pgs) - 1;
> +	int i;
> +
> +	for (i = 0 ; i <=  c; i++) {
> +		ret = nand_read_page_raw(chip, data_buf, 1, page + pgs[i]);
> +		if (ret)
> +			continue; /* Read error, we just skip it now. */
> +		if (check_page_if_emtpy(chip, data_buf) == true) {
> +			if (pgs[i] == 0)
> +			/* page0 is not programmed. */
> +				goto free_buf;
> +
> +			/* Mark page need to program later */
> +			empty_page_mask |= (1 << pgs[i]);
> +		} else {
> +			if (pgs[i] == LAST_CHECKPU_PAGE)
> +			/*
> +			 * If page13 is programmed, this block has
> +			 * met the requirement of robust erase.
> +			 */
> +				goto free_buf;
> +		}
> +	}
> +	/* Corrupt page0 and page1, in order to simulate an
> +	 * uncompleted eraseing scenario. Just for case of
> +	 * power loss hits while below programming. in this
> +	 * way, the PEB will be re-erased again.

Looks like the "erase the first 2 pages" thing is here to corrupt
UBI's VID and EC headers. Is this really needed? We should definitely
not assume the NAND user is UBI, and that's exactly what you're doing
here.

> +	 */
> +	empty_page_mask |= 0x3;
> +	memset(data_buf, 0xAA,  mtd->writesize);
> +
> +	for (i = 0; i <= LAST_CHECKPU_PAGE; ) {
> +
> +	/* Start to program empty page, we only program the odd page. */
> +		if (empty_page_mask & (1 << i))
> +			nand_write_page_raw(chip, data_buf, 1, page + i);
> +
> +
> +		if (i == 0)
> +			i = 1;
> +		else
> +			i += 2;
> +	}
> +
> +free_buf:
> +	kfree(data_buf);
> +	return 0;
> +}
> +
>  const struct nand_manufacturer_ops micron_nand_manuf_ops = {
>  	.init = micron_nand_init,
>  	.cleanup = micron_nand_cleanup,
>  	.fixup_onfi_param_page = micron_fixup_onfi_param_page,
> +	.erase_pre = before_erase_checkup,
>  };

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/