On Thu, 21 Jan 2016 18:38:05 +0530 Archit Taneja <architt@xxxxxxxxxxxxxx> wrote: > > > On 01/21/2016 06:06 PM, Boris Brezillon wrote: > > On Thu, 21 Jan 2016 16:30:48 +0530 > > Archit Taneja <architt@xxxxxxxxxxxxxx> wrote: > > > >> > >> > >> On 01/21/2016 03:43 PM, Boris Brezillon wrote: > >>> On Thu, 21 Jan 2016 15:22:39 +0530 > >>> Archit Taneja <architt@xxxxxxxxxxxxxx> wrote: > >>> > >>>> > >>>> > >>>> On 01/21/2016 02:21 PM, Boris Brezillon wrote: > >>>>> Hi Archit, > >>>>> > >>>>> On Thu, 21 Jan 2016 12:43:18 +0530 > >>>>> Archit Taneja <architt@xxxxxxxxxxxxxx> wrote: > >>>>> > >>>>>> The Qualcomm NAND controller is found in SoCs like IPQ806x, MSM7xx, > >>>>>> MDM9x15 series. > >>>>>> > >>>>>> It exists as a sub block inside the IPs EBI2 (External Bus Interface 2) > >>>>>> and QPIC (Qualcomm Parallel Interface Controller). These IPs provide a > >>>>>> broader interface for external slow peripheral devices such as LCD and > >>>>>> NAND/NOR flash memory or SRAM like interfaces. > >>>>>> > >>>>>> We add support for the NAND controller found within EBI2. For the SoCs > >>>>>> of our interest, we only use the NAND controller within EBI2. Therefore, > >>>>>> it's safe for us to assume that the NAND controller is a standalone block > >>>>>> within the SoC. > >>>>>> > >>>>>> The controller supports 512B, 2kB, 4kB and 8kB page 8-bit and 16-bit NAND > >>>>>> flash devices. It contains a HW ECC block that supports BCH ECC (4, 8 and > >>>>>> 16 bit correction/step) and RS ECC(4 bit correction/step) that covers main > >>>>>> and spare data. The controller contains an internal 512 byte page buffer > >>>>>> to which we read/write via DMA. The EBI2 type NAND controller uses ADM DMA > >>>>>> for register read/write and data transfers. The controller performs page > >>>>>> reads and writes at a codeword/step level of 512 bytes. It can support up > >>>>>> to 2 external chips of different configurations. > >>>>>> > >>>>>> The driver prepares register read and write configuration descriptors for > >>>>>> each codeword, followed by data descriptors to read or write data from the > >>>>>> controller's internal buffer. It uses a single ADM DMA channel that we get > >>>>>> via dmaengine API. The controller requires 2 ADM CRCIs for command and > >>>>>> data flow control. These are passed via DT. > >>>>>> > >>>>>> The ecc layout used by the controller is syndrome like, but we can't use > >>>>>> the standard syndrome ecc ops because of several reasons. First, the amount > >>>>>> of data bytes covered by ecc isn't same in each step. Second, writing to > >>>>>> free oob space requires us writing to the entire step in which the oob > >>>>>> lies. This forces us to create our own ecc ops. > >>>>>> > >>>>>> One more difference is how the controller accesses the bad block marker. > >>>>>> The controller ignores reading the marker when ECC is enabled. ECC needs > >>>>>> to be explicity disabled to read or write to the bad block marker. The > >>>>>> nand_bbt helpers library hence can't access BBMs for the controller. > >>>>>> For now, we skip the creation of BBT and populate chip->block_bad and > >>>>>> chip->block_markbad helpers instead. > >>>>>> > >>>>>> Reviewed-by: Andy Gross <agross@xxxxxxxxxxxxxx> > >>>>>> Signed-off-by: Stephen Boyd <sboyd@xxxxxxxxxxxxxx> > >>>>>> Signed-off-by: Archit Taneja <architt@xxxxxxxxxxxxxx> > >>>>> > >>>>> Sorry, I noticed one more thing in your "bitflips in erased pages" > >>>>> handling. Once this is addressed (or explained) you can add my > >>>>> > >>>>> Reviewed-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxxxxxxx> > >>>> > >>>> Thanks! I've given an explanation below. > >>>> > >>>>> > >>>>>> --- > >>>>>> v7: > >>>>>> - Incorporated missing/new comments by Boris > >>>>>> - Cleaned up some strict checkpatch warnings > >>>>>> > >>>>>> v6: > >>>>>> - Fix up erased page parsing. Use nand_check_erased_ecc_chunk to > >>>>>> return corrected bitflips in an erased page. > >>>>>> - Fix whitespace issues > >>>>>> - Update compatible tring to something more specific > >>>>>> > >>>>>> v5: > >>>>>> - split chip/controller structs > >>>>>> - simplify layout by considering reserved bytes as part of ECC > >>>>>> - create ecc layouts automatically > >>>>>> - implement block_bad and block_markbad chip ops instead of > >>>>>> - read_oob_raw/write_oob_raw ecc ops to access BBMs. > >>>>>> - Add NAND_SKIP_BBTSCAN flag until we get badblockbits support. > >>>>>> - misc clean ups > >>>>>> > >>>>>> v4: > >>>>>> - Shrink submit_descs > >>>>>> - add desc list node at the end of dma_prep_desc > >>>>>> - Endianness and warning fixes > >>>>>> - Add Stephen's Signed-off since he provided a patch to fix > >>>>>> endianness problems > >>>>>> > >>>>>> v3: > >>>>>> - Refactor dma functions for maximum reuse > >>>>>> - Use dma_slave_confing on stack > >>>>>> - optimize and clean upempty_page_fixup using memchr_inv > >>>>>> - ensure portability with dma register reads using le32_* funcs > >>>>>> - use NAND_USE_BOUNCE_BUFFER instead of doing it ourselves > >>>>>> - fix handling of return values of dmaengine funcs > >>>>>> - constify wherever possible > >>>>>> - Remove dependency on ADM DMA in Kconfig > >>>>>> - Misc fixes and clean ups > >>>>>> > >>>>>> v2: > >>>>>> - Use new BBT flag that allows us to read BBM in raw mode > >>>>>> - reduce memcpy-s in the driver > >>>>>> - some refactor and clean ups because of above changes > >>>>>> > >>>>>> drivers/mtd/nand/Kconfig | 7 + > >>>>>> drivers/mtd/nand/Makefile | 1 + > >>>>>> drivers/mtd/nand/qcom_nandc.c | 2024 +++++++++++++++++++++++++++++++++++++++++ > >>>>>> 3 files changed, 2032 insertions(+) > >>>>>> create mode 100644 drivers/mtd/nand/qcom_nandc.c > >>>>>> > >>>>> > >>>>> [...] > >>>>> > >>>>>> diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c > >>>>>> new file mode 100644 > >>>>>> index 0000000..269d388 > >>>>>> --- /dev/null > >>>>>> +++ b/drivers/mtd/nand/qcom_nandc.c > >>>>> > >>>>> [...] > >>>>> > >>>>>> +/* > >>>>>> + * when using BCH ECC, the HW flags an error in NAND_FLASH_STATUS if it read > >>>>>> + * an erased CW, and reports an erased CW in NAND_ERASED_CW_DETECT_STATUS. > >>>>>> + * > >>>>>> + * when using RS ECC, the HW reports the same erros when reading an erased CW, > >>>>>> + * but it notifies that it is an erased CW by placing special characters at > >>>>>> + * certain offsets in the buffer. > >>>>>> + * > >>>>>> + * verify if the page is erased or not, and fix up the page for RS ECC by > >>>>>> + * replacing the special characters with 0xff > >>>>>> + */ > >>>>>> +static bool empty_page_fixup(struct qcom_nand_host *host, u8 *data_buf) > >>>>>> +{ > >>>>>> + struct nand_chip *chip = &host->chip; > >>>>>> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip); > >>>>>> + struct nand_ecc_ctrl *ecc = &chip->ecc; > >>>>>> + struct read_stats *buf; > >>>>>> + int i; > >>>>>> + > >>>>>> + buf = (struct read_stats *)nandc->reg_read_buf; > >>>>>> + > >>>>>> + for (i = 0; i < ecc->steps; i++, buf++) { > >>>>>> + u32 flash, erased_cw; > >>>>>> + u8 empty1, empty2; > >>>>>> + > >>>>>> + flash = le32_to_cpu(buf->flash); > >>>>>> + erased_cw = le32_to_cpu(buf->erased_cw); > >>>>>> + > >>>>>> + /* > >>>>>> + * an erased page flags an error in NAND_FLASH_STATUS, if there > >>>>>> + * isn't any error, bail out early and report a non-erased > >>>>>> + * page > >>>>>> + */ > >>>>>> + if (!(flash & FS_OP_ERR)) > >>>>>> + break; > >>>>>> + > >>>>>> + /* > >>>>>> + * if BCH is enabled, HW will take care of detecting erased > >>>>>> + * pages > >>>>>> + */ > >>>>>> + if (host->bch_enabled) { > >>>>>> + /* bail out if we didn't detect an erased CW */ > >>>>>> + if ((erased_cw & ERASED_CW) != ERASED_CW) > >>>>>> + break; > >>>>>> + } else { > >>>>>> + /* > >>>>>> + * if RS ECC is enabled, check if the CW is erased by > >>>>>> + * looking for 0x54s at offsets 3 and 175 > >>>>>> + */ > >>>>>> + empty1 = data_buf[3 + i * host->cw_data]; > >>>>>> + empty2 = data_buf[175 + i * host->cw_data]; > >>>>>> + > >>>>>> + /* bail out if the CW isn't erased */ > >>>>>> + if (!(empty1 == 0x54 && empty2 == 0xff) && > >>>>>> + !(empty1 == 0xff && empty2 == 0x54)) > >>>>>> + break; > >>>>>> + } > >>>>>> + } > >>>>>> + > >>>>>> + if (i < ecc->steps) > >>>>>> + return false; > >>>>>> + > >>>>>> + if (!host->bch_enabled) { > >>>>>> + /* > >>>>>> + * fix up the buffer by replacing the magic offsets with > >>>>>> + * 0xff > >>>>>> + */ > >>>>>> + for (i = 0; i < ecc->steps; i++) { > >>>>>> + data_buf[3 + i * host->cw_data] = 0xff; > >>>>>> + data_buf[175 + i * host->cw_data] = 0xff; > >>>>>> + } > >>>>>> + } > >>>>>> + > >>>>>> + return true; > >>>>>> +} > >>>>>> + > >>>>>> +/* > >>>>>> + * reads back status registers set by the controller to notify page read > >>>>>> + * errors. this is equivalent to what 'ecc->correct()' would do. > >>>>>> + */ > >>>>>> +static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf, > >>>>>> + u8 *oob_buf, bool erased_page) > >>>>>> +{ > >>>>>> + struct nand_chip *chip = &host->chip; > >>>>>> + struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip); > >>>>>> + struct mtd_info *mtd = nand_to_mtd(chip); > >>>>>> + struct nand_ecc_ctrl *ecc = &chip->ecc; > >>>>>> + unsigned int max_bitflips = 0; > >>>>>> + struct read_stats *buf; > >>>>>> + int i; > >>>>>> + > >>>>>> + buf = (struct read_stats *)nandc->reg_read_buf; > >>>>>> + > >>>>>> + for (i = 0; i < ecc->steps; i++, buf++) { > >>>>>> + u32 flash, buffer; > >>>>>> + int data_len, oob_len; > >>>>>> + > >>>>>> + if (i == (ecc->steps - 1)) { > >>>>>> + data_len = ecc->size - ((ecc->steps - 1) << 2); > >>>>>> + oob_len = ecc->steps << 2; > >>>>>> + } else { > >>>>>> + data_len = host->cw_data; > >>>>>> + oob_len = 0; > >>>>>> + } > >>>>>> + > >>>>>> + flash = le32_to_cpu(buf->flash); > >>>>>> + buffer = le32_to_cpu(buf->buffer); > >>>>>> + > >>>>>> + if (flash & (FS_OP_ERR | FS_MPU_ERR)) { > >>>>>> + if (erased_page) { > >>>>>> + int ret, ecclen, extraooblen; > >>>>>> + void *eccbuf; > >>>>>> + > >>>>>> + eccbuf = oob_buf ? oob_buf + oob_len : NULL; > >>>>>> + ecclen = oob_buf ? host->ecc_bytes_hw : 0; > >>>>>> + extraooblen = oob_buf ? oob_len : 0; > >>>>>> + > >>>>>> + ret = nand_check_erased_ecc_chunk(data_buf, > >>>>>> + data_len, eccbuf, ecclen, oob_buf, > >>>>>> + extraooblen, ecc->strength); > >>>>> > >>>>> IIUC, the erased_page info is returned by empty_page_fixup() and is > >>>>> only set if the page is detected as empty (filled with ff). > >>>>> If that's the case, then you don't have to use > >>>>> nand_check_erased_ecc_chunk() to check it again... > >>>> > >>>> empty_page_fixup now doesn't parse the entire page for 0xffs, it just > >>>> checks if the correct flags have been raised by the controller hardware, > >>>> and replaces the 'special offsets' with 0xffs instead of 0x54s. > >>> > >>> But didn't you say that those pattern assignment were guaranteeing that > >>> the tested chunk is empty? Or am I missing something else? > >> > >> No, the controller reports 0x54s at special offsets, but we still need > >> to parse the entire buffer for 0xffs because the flash user might have > >> intentionally placed 0x54 in that offsets. > >> > >> The previous revisions of the patchset first the changed the > >> 0x54s to 0xffs at the special offsets, and then checked if the > >> entire page for 0xffs. If any single byte wasn't 0xff, it reported it > >> as not empty and replaced the offsets back with 0x54s. For the newer > >> IPs, we don't need to read the entirepage, we only need to read a > >> bitfield per chunk to be sure. > > > > Oh, I thought the FS_OP_ERR + 0X54 pattern @3 and 175 were enough to > > detect an empty page, but I must have misunderstood your previous > > explanations. Anyway, adding an extra nand_check_erased_ecc_chunk() > > here shouldn't hurt, so I'm fine with that one. > > Yeah, we need to check manually too, sadly. Although, since we are sure > that it is always 0xffs, I can put the faster memchr_inv func to check > if the page is erased. > > > > >> > >>> > >>>> > >>>> From what I understood, we still need to parse the chunks to try to > >>>> fix 'ecc->strength' number of bitflips. > >>> > >>> No, if the controller tests and guarantees that a specific page is empty > >>> (filled with ff), then we should trust it. > >>> I suggested to use nand_check_erased_ecc_chunk() in one of my > >>> previous review because I thought the 0x54 detection scheme was not > >>> sufficient to detect empty pages, but you said it was, so I trust > >>> you ;). And if it's really safe, then we don't need to check again with > >>> nand_check_erased_ecc_chunk() here. > >> > >> Okay. I thought that when NAND controllers report an empty page, there > >> can still be bitflips in them once we read it, and that we need to > >> use nand_check_erased_ecc_chunk to set those bits back to 1. > > > > It's really dependent on your NAND controller, so I can't answer that > > question for your specific case, but usually NAND controller are able > > to detect pages filled with 0xff, but as soon as you have a single > > bitflip, the control is passed to the ECC engine which tries to correct > > errors (and fails to do it in most cases). > > Yes, that seems to be the case for this controller. > > > Some ECC engines are smarter and you can pass them an 'acceptable' > > number of bitflips that is used by the "erased page detection" logic. > > I haven't seen such an 'acceptable' bitfield for this controller. > > > And other ECC engines take care of xoring the ECC bytes so that it > > generates 0xff bytes for empty pages (this solution is the ideal one, > > since you're guaranteed to fix bitflips even for the empty/erased page > > case). > > Okay, I doubt that the controller does that here, but I'll go through > the docs and verify. Thanks for the explanation. > > > > >> > >>> > >>>> > >>>>> > >>>>>> + if (ret < 0) { > >>>>>> + mtd->ecc_stats.failed++; > >>>>>> + } else { > >>>>>> + mtd->ecc_stats.corrected += ret; > >>>>>> + max_bitflips = > >>>>>> + max_t(unsigned int, max_bitflips, ret); > >>>>>> + } > >>>>>> + } else { > >>>>>> + if (buffer & BS_UNCORRECTABLE_BIT) { > >>>>> > >>>>> ... here is where you should check if what was detected as > >>>>> uncorrectable errors is not in fact some bitflips in an erased page. > >>>> > >>>> This path will never hit for a page reported as erased by the HW. The > >>>> 'else' branch happens only for pages that were reported as 'not erased' > >>>> by empty page fixup. > >>>> > >>>> In other words, the BS_UNCORRECTABLE_BIT register bitfield is never > >>>> checked for an erased page. In some experiments I performed, I noticed > >>>> that this bitfield was almost always set for an erased page. There is > >>>> not point in even checking this field for an erased page. > >>> > >>> No, my point was that, if you have one or several bitflips in an erased > >>> page (which can happen), then your NAND controller will first detect > >>> that it's not an empty page (because you have some bits set to zero), > >>> and then try to correct the errors with its ECC engine (which will > >>> probably fail, unless your controller generate 0xff ECC bytes for an > >>> empty page). nand_check_erased_ecc_chunk() has been created exactly for > >>> this use case: manually check if a page is 'almost' empty when the ECC > >>> engine fails to correct errors. > >> > >> Okay. I think I understand now :). I thought nand_check_erased_ecc_chunk > >> had to be used for pages that were reported as erased, but it is used > >> for pages that aren't detected as erased because of some bitflips, and > >> we just make sure if it is erased or not. > >> > >> Should the pseudo code look something like this? > >> > >> /* ecc->read_page */ > >> int qcom_nand_read_page(...) > >> { > >> /* read the page */ > >> ... > >> > >> erased = empty_page_fixup(); > >> /* we make sure above that the entire page is 0xffs or not */ > >> > >> return parse_read_errors(host, erased); > >> } > >> > >> int parse_read_errors(host, erased) > >> { > >> for each chunk { > >> if (!erased) { > >> if (uncorrectable errors) { > >> ret = nand_check_erased_ecc_chunk(); > >> if (ret < 0) { > >> /* not an erased page, report */ > >> stats.failed++ > >> } else { > >> /* 'almost' empty page which > >> HW couldn't detect as erased */ > >> stats.corrected += ret; > >> } > >> } else { > >> stats.corrected += stat; > >> } > >> } > >> } > >> } > > > > Yep, exactly. > > Thanks. I'll fix this and test it out. If you want to check that your "bitflips in erased pages" handling is correct, you can try this tool to artificially flip some bits [1]. # flash_erase /dev/mtdX Y 1 # nandflipbits /dev/mtdX 2@<Y>:3@<Y+1> Best Regards, Boris [1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html