Re: [PATCH v7 2/3] mtd: nand: Qualcomm NAND controller driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 21 Jan 2016 15:22:39 +0530
Archit Taneja <architt@xxxxxxxxxxxxxx> wrote:

> 
> 
> On 01/21/2016 02:21 PM, Boris Brezillon wrote:
> > Hi Archit,
> >
> > On Thu, 21 Jan 2016 12:43:18 +0530
> > Archit Taneja <architt@xxxxxxxxxxxxxx> wrote:
> >
> >> The Qualcomm NAND controller is found in SoCs like IPQ806x, MSM7xx,
> >> MDM9x15 series.
> >>
> >> It exists as a sub block inside the IPs EBI2 (External Bus Interface 2)
> >> and QPIC (Qualcomm Parallel Interface Controller). These IPs provide a
> >> broader interface for external slow peripheral devices such as LCD and
> >> NAND/NOR flash memory or SRAM like interfaces.
> >>
> >> We add support for the NAND controller found within EBI2. For the SoCs
> >> of our interest, we only use the NAND controller within EBI2. Therefore,
> >> it's safe for us to assume that the NAND controller is a standalone block
> >> within the SoC.
> >>
> >> The controller supports 512B, 2kB, 4kB and 8kB page 8-bit and 16-bit NAND
> >> flash devices. It contains a HW ECC block that supports BCH ECC (4, 8 and
> >> 16 bit correction/step) and RS ECC(4 bit correction/step) that covers main
> >> and spare data. The controller contains an internal 512 byte page buffer
> >> to which we read/write via DMA. The EBI2 type NAND controller uses ADM DMA
> >> for register read/write and data transfers. The controller performs page
> >> reads and writes at a codeword/step level of 512 bytes. It can support up
> >> to 2 external chips of different configurations.
> >>
> >> The driver prepares register read and write configuration descriptors for
> >> each codeword, followed by data descriptors to read or write data from the
> >> controller's internal buffer. It uses a single ADM DMA channel that we get
> >> via dmaengine API. The controller requires 2 ADM CRCIs for command and
> >> data flow control. These are passed via DT.
> >>
> >> The ecc layout used by the controller is syndrome like, but we can't use
> >> the standard syndrome ecc ops because of several reasons. First, the amount
> >> of data bytes covered by ecc isn't same in each step. Second, writing to
> >> free oob space requires us writing to the entire step in which the oob
> >> lies. This forces us to create our own ecc ops.
> >>
> >> One more difference is how the controller accesses the bad block marker.
> >> The controller ignores reading the marker when ECC is enabled. ECC needs
> >> to be explicity disabled to read or write to the bad block marker. The
> >> nand_bbt helpers library hence can't access BBMs for the controller.
> >> For now, we skip the creation of BBT and populate chip->block_bad and
> >> chip->block_markbad helpers instead.
> >>
> >> Reviewed-by: Andy Gross <agross@xxxxxxxxxxxxxx>
> >> Signed-off-by: Stephen Boyd <sboyd@xxxxxxxxxxxxxx>
> >> Signed-off-by: Archit Taneja <architt@xxxxxxxxxxxxxx>
> >
> > Sorry, I noticed one more thing in your "bitflips in erased pages"
> > handling. Once this is addressed (or explained) you can add my
> >
> > Reviewed-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxxxxxxx>
> 
> Thanks! I've given an explanation below.
> 
> >
> >> ---
> >> v7:
> >>    - Incorporated missing/new comments by Boris
> >>    - Cleaned up some strict checkpatch warnings
> >>
> >> v6:
> >>    - Fix up erased page parsing. Use nand_check_erased_ecc_chunk to
> >>      return corrected bitflips in an erased page.
> >>    - Fix whitespace issues
> >>    - Update compatible tring to something more specific
> >>
> >> v5:
> >>    - split chip/controller structs
> >>    - simplify layout by considering reserved bytes as part of ECC
> >>    - create ecc layouts automatically
> >>    - implement block_bad and block_markbad chip ops instead of
> >>    - read_oob_raw/write_oob_raw ecc ops to access BBMs.
> >>    - Add NAND_SKIP_BBTSCAN flag until we get badblockbits support.
> >>    - misc clean ups
> >>
> >> v4:
> >>    - Shrink submit_descs
> >>    - add desc list node at the end of dma_prep_desc
> >>    - Endianness and warning fixes
> >>    - Add Stephen's Signed-off since he provided a patch to fix
> >>      endianness problems
> >>
> >> v3:
> >>    - Refactor dma functions for maximum reuse
> >>    - Use dma_slave_confing on stack
> >>    - optimize and clean upempty_page_fixup using memchr_inv
> >>    - ensure portability with dma register reads using le32_* funcs
> >>    - use NAND_USE_BOUNCE_BUFFER instead of doing it ourselves
> >>    - fix handling of return values of dmaengine funcs
> >>    - constify wherever possible
> >>    - Remove dependency on ADM DMA in Kconfig
> >>    - Misc fixes and clean ups
> >>
> >> v2:
> >>    - Use new BBT flag that allows us to read BBM in raw mode
> >>    - reduce memcpy-s in the driver
> >>    - some refactor and clean ups because of above changes
> >>
> >>   drivers/mtd/nand/Kconfig      |    7 +
> >>   drivers/mtd/nand/Makefile     |    1 +
> >>   drivers/mtd/nand/qcom_nandc.c | 2024 +++++++++++++++++++++++++++++++++++++++++
> >>   3 files changed, 2032 insertions(+)
> >>   create mode 100644 drivers/mtd/nand/qcom_nandc.c
> >>
> >
> > [...]
> >
> >> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
> >> new file mode 100644
> >> index 0000000..269d388
> >> --- /dev/null
> >> +++ b/drivers/mtd/nand/qcom_nandc.c
> >
> > [...]
> >
> >> +/*
> >> + * when using BCH ECC, the HW flags an error in NAND_FLASH_STATUS if it read
> >> + * an erased CW, and reports an erased CW in NAND_ERASED_CW_DETECT_STATUS.
> >> + *
> >> + * when using RS ECC, the HW reports the same erros when reading an erased CW,
> >> + * but it notifies that it is an erased CW by placing special characters at
> >> + * certain offsets in the buffer.
> >> + *
> >> + * verify if the page is erased or not, and fix up the page for RS ECC by
> >> + * replacing the special characters with 0xff
> >> + */
> >> +static bool empty_page_fixup(struct qcom_nand_host *host, u8 *data_buf)
> >> +{
> >> +	struct nand_chip *chip = &host->chip;
> >> +	struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> >> +	struct nand_ecc_ctrl *ecc = &chip->ecc;
> >> +	struct read_stats *buf;
> >> +	int i;
> >> +
> >> +	buf = (struct read_stats *)nandc->reg_read_buf;
> >> +
> >> +	for (i = 0; i < ecc->steps; i++, buf++) {
> >> +		u32 flash, erased_cw;
> >> +		u8 empty1, empty2;
> >> +
> >> +		flash = le32_to_cpu(buf->flash);
> >> +		erased_cw = le32_to_cpu(buf->erased_cw);
> >> +
> >> +		/*
> >> +		 * an erased page flags an error in NAND_FLASH_STATUS, if there
> >> +		 * isn't any error, bail out early and report a non-erased
> >> +		 * page
> >> +		 */
> >> +		if (!(flash & FS_OP_ERR))
> >> +			break;
> >> +
> >> +		/*
> >> +		 * if BCH is enabled, HW will take care of detecting erased
> >> +		 * pages
> >> +		 */
> >> +		if (host->bch_enabled) {
> >> +			/* bail out if we didn't detect an erased CW */
> >> +			if ((erased_cw & ERASED_CW) != ERASED_CW)
> >> +				break;
> >> +		} else {
> >> +			/*
> >> +			 * if RS ECC is enabled, check if the CW is erased by
> >> +			 * looking for 0x54s at offsets 3 and 175
> >> +			 */
> >> +			empty1 = data_buf[3 + i * host->cw_data];
> >> +			empty2 = data_buf[175 + i * host->cw_data];
> >> +
> >> +			/* bail out if the CW isn't erased */
> >> +			if (!(empty1 == 0x54 && empty2 == 0xff) &&
> >> +			    !(empty1 == 0xff && empty2 == 0x54))
> >> +				break;
> >> +		}
> >> +	}
> >> +
> >> +	if (i < ecc->steps)
> >> +		return false;
> >> +
> >> +	if (!host->bch_enabled) {
> >> +		/*
> >> +		 * fix up the buffer by replacing the magic offsets with
> >> +		 * 0xff
> >> +		 */
> >> +		for (i = 0; i < ecc->steps; i++) {
> >> +			data_buf[3 + i * host->cw_data] = 0xff;
> >> +			data_buf[175 + i * host->cw_data] = 0xff;
> >> +		}
> >> +	}
> >> +
> >> +	return true;
> >> +}
> >> +
> >> +/*
> >> + * reads back status registers set by the controller to notify page read
> >> + * errors. this is equivalent to what 'ecc->correct()' would do.
> >> + */
> >> +static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
> >> +			     u8 *oob_buf, bool erased_page)
> >> +{
> >> +	struct nand_chip *chip = &host->chip;
> >> +	struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
> >> +	struct mtd_info *mtd = nand_to_mtd(chip);
> >> +	struct nand_ecc_ctrl *ecc = &chip->ecc;
> >> +	unsigned int max_bitflips = 0;
> >> +	struct read_stats *buf;
> >> +	int i;
> >> +
> >> +	buf = (struct read_stats *)nandc->reg_read_buf;
> >> +
> >> +	for (i = 0; i < ecc->steps; i++, buf++) {
> >> +		u32 flash, buffer;
> >> +		int data_len, oob_len;
> >> +
> >> +		if (i == (ecc->steps - 1)) {
> >> +			data_len = ecc->size - ((ecc->steps - 1) << 2);
> >> +			oob_len = ecc->steps << 2;
> >> +		} else {
> >> +			data_len = host->cw_data;
> >> +			oob_len = 0;
> >> +		}
> >> +
> >> +		flash = le32_to_cpu(buf->flash);
> >> +		buffer = le32_to_cpu(buf->buffer);
> >> +
> >> +		if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
> >> +			if (erased_page) {
> >> +				int ret, ecclen, extraooblen;
> >> +				void *eccbuf;
> >> +
> >> +				eccbuf = oob_buf ? oob_buf + oob_len : NULL;
> >> +				ecclen = oob_buf ? host->ecc_bytes_hw : 0;
> >> +				extraooblen = oob_buf ? oob_len : 0;
> >> +
> >> +				ret = nand_check_erased_ecc_chunk(data_buf,
> >> +					data_len, eccbuf, ecclen, oob_buf,
> >> +					extraooblen, ecc->strength);
> >
> > IIUC, the erased_page info is returned by empty_page_fixup() and is
> > only set if the page is detected as empty (filled with ff).
> > If that's the case, then you don't have to use
> > nand_check_erased_ecc_chunk() to check it again...
> 
> empty_page_fixup now doesn't parse the entire page for 0xffs, it just
> checks if the correct flags have been raised by the controller hardware,
> and replaces the 'special offsets' with 0xffs instead of 0x54s.

But didn't you say that those pattern assignment were guaranteeing that
the tested chunk is empty? Or am I missing something else?

> 
>  From what I understood, we still need to parse the chunks to try to
> fix 'ecc->strength' number of bitflips.

No, if the controller tests and guarantees that a specific page is empty
(filled with ff), then we should trust it.
I suggested to use nand_check_erased_ecc_chunk() in one of my
previous review because I thought the 0x54 detection scheme was not
sufficient to detect empty pages, but you said it was, so I trust
you ;). And if it's really safe, then we don't need to check again with 
nand_check_erased_ecc_chunk() here.

> 
> >
> >> +				if (ret < 0) {
> >> +					mtd->ecc_stats.failed++;
> >> +				} else {
> >> +					mtd->ecc_stats.corrected += ret;
> >> +					max_bitflips =
> >> +						max_t(unsigned int, max_bitflips, ret);
> >> +				}
> >> +			} else {
> >> +				if (buffer & BS_UNCORRECTABLE_BIT) {
> >
> > ... here is where you should check if what was detected as
> > uncorrectable errors is not in fact some bitflips in an erased page.
> 
> This path will never hit for a page reported as erased by the HW. The
> 'else' branch happens only for pages that were reported as 'not erased'
> by empty page fixup.
> 
> In other words, the BS_UNCORRECTABLE_BIT register bitfield is never
> checked for an erased page. In some experiments I performed, I noticed
> that this bitfield was almost always set for an erased page. There is
> not point in even checking this field for an erased page.

No, my point was that, if you have one or several bitflips in an erased
page (which can happen), then your NAND controller will first detect
that it's not an empty page (because you have some bits set to zero),
and then try to correct the errors with its ECC engine (which will
probably fail, unless your controller generate 0xff ECC bytes for an
empty page). nand_check_erased_ecc_chunk() has been created exactly for
this use case: manually check if a page is 'almost' empty when the ECC
engine fails to correct errors.



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux