On 1/21/2016 6:55 PM, Boris Brezillon wrote:
On Thu, 21 Jan 2016 18:38:05 +0530
Archit Taneja <architt@xxxxxxxxxxxxxx> wrote:
On 01/21/2016 06:06 PM, Boris Brezillon wrote:
On Thu, 21 Jan 2016 16:30:48 +0530
Archit Taneja <architt@xxxxxxxxxxxxxx> wrote:
On 01/21/2016 03:43 PM, Boris Brezillon wrote:
On Thu, 21 Jan 2016 15:22:39 +0530
Archit Taneja <architt@xxxxxxxxxxxxxx> wrote:
On 01/21/2016 02:21 PM, Boris Brezillon wrote:
Hi Archit,
On Thu, 21 Jan 2016 12:43:18 +0530
Archit Taneja <architt@xxxxxxxxxxxxxx> wrote:
The Qualcomm NAND controller is found in SoCs like IPQ806x, MSM7xx,
MDM9x15 series.
It exists as a sub block inside the IPs EBI2 (External Bus Interface 2)
and QPIC (Qualcomm Parallel Interface Controller). These IPs provide a
broader interface for external slow peripheral devices such as LCD and
NAND/NOR flash memory or SRAM like interfaces.
We add support for the NAND controller found within EBI2. For the SoCs
of our interest, we only use the NAND controller within EBI2. Therefore,
it's safe for us to assume that the NAND controller is a standalone block
within the SoC.
The controller supports 512B, 2kB, 4kB and 8kB page 8-bit and 16-bit NAND
flash devices. It contains a HW ECC block that supports BCH ECC (4, 8 and
16 bit correction/step) and RS ECC(4 bit correction/step) that covers main
and spare data. The controller contains an internal 512 byte page buffer
to which we read/write via DMA. The EBI2 type NAND controller uses ADM DMA
for register read/write and data transfers. The controller performs page
reads and writes at a codeword/step level of 512 bytes. It can support up
to 2 external chips of different configurations.
The driver prepares register read and write configuration descriptors for
each codeword, followed by data descriptors to read or write data from the
controller's internal buffer. It uses a single ADM DMA channel that we get
via dmaengine API. The controller requires 2 ADM CRCIs for command and
data flow control. These are passed via DT.
The ecc layout used by the controller is syndrome like, but we can't use
the standard syndrome ecc ops because of several reasons. First, the amount
of data bytes covered by ecc isn't same in each step. Second, writing to
free oob space requires us writing to the entire step in which the oob
lies. This forces us to create our own ecc ops.
One more difference is how the controller accesses the bad block marker.
The controller ignores reading the marker when ECC is enabled. ECC needs
to be explicity disabled to read or write to the bad block marker. The
nand_bbt helpers library hence can't access BBMs for the controller.
For now, we skip the creation of BBT and populate chip->block_bad and
chip->block_markbad helpers instead.
Reviewed-by: Andy Gross <agross@xxxxxxxxxxxxxx>
Signed-off-by: Stephen Boyd <sboyd@xxxxxxxxxxxxxx>
Signed-off-by: Archit Taneja <architt@xxxxxxxxxxxxxx>
Sorry, I noticed one more thing in your "bitflips in erased pages"
handling. Once this is addressed (or explained) you can add my
Reviewed-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxxxxxxx>
Thanks! I've given an explanation below.
---
v7:
- Incorporated missing/new comments by Boris
- Cleaned up some strict checkpatch warnings
v6:
- Fix up erased page parsing. Use nand_check_erased_ecc_chunk to
return corrected bitflips in an erased page.
- Fix whitespace issues
- Update compatible tring to something more specific
v5:
- split chip/controller structs
- simplify layout by considering reserved bytes as part of ECC
- create ecc layouts automatically
- implement block_bad and block_markbad chip ops instead of
- read_oob_raw/write_oob_raw ecc ops to access BBMs.
- Add NAND_SKIP_BBTSCAN flag until we get badblockbits support.
- misc clean ups
v4:
- Shrink submit_descs
- add desc list node at the end of dma_prep_desc
- Endianness and warning fixes
- Add Stephen's Signed-off since he provided a patch to fix
endianness problems
v3:
- Refactor dma functions for maximum reuse
- Use dma_slave_confing on stack
- optimize and clean upempty_page_fixup using memchr_inv
- ensure portability with dma register reads using le32_* funcs
- use NAND_USE_BOUNCE_BUFFER instead of doing it ourselves
- fix handling of return values of dmaengine funcs
- constify wherever possible
- Remove dependency on ADM DMA in Kconfig
- Misc fixes and clean ups
v2:
- Use new BBT flag that allows us to read BBM in raw mode
- reduce memcpy-s in the driver
- some refactor and clean ups because of above changes
drivers/mtd/nand/Kconfig | 7 +
drivers/mtd/nand/Makefile | 1 +
drivers/mtd/nand/qcom_nandc.c | 2024 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 2032 insertions(+)
create mode 100644 drivers/mtd/nand/qcom_nandc.c
[...]
diff --git a/drivers/mtd/nand/qcom_nandc.c b/drivers/mtd/nand/qcom_nandc.c
new file mode 100644
index 0000000..269d388
--- /dev/null
+++ b/drivers/mtd/nand/qcom_nandc.c
[...]
+/*
+ * when using BCH ECC, the HW flags an error in NAND_FLASH_STATUS if it read
+ * an erased CW, and reports an erased CW in NAND_ERASED_CW_DETECT_STATUS.
+ *
+ * when using RS ECC, the HW reports the same erros when reading an erased CW,
+ * but it notifies that it is an erased CW by placing special characters at
+ * certain offsets in the buffer.
+ *
+ * verify if the page is erased or not, and fix up the page for RS ECC by
+ * replacing the special characters with 0xff
+ */
+static bool empty_page_fixup(struct qcom_nand_host *host, u8 *data_buf)
+{
+ struct nand_chip *chip = &host->chip;
+ struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
+ struct nand_ecc_ctrl *ecc = &chip->ecc;
+ struct read_stats *buf;
+ int i;
+
+ buf = (struct read_stats *)nandc->reg_read_buf;
+
+ for (i = 0; i < ecc->steps; i++, buf++) {
+ u32 flash, erased_cw;
+ u8 empty1, empty2;
+
+ flash = le32_to_cpu(buf->flash);
+ erased_cw = le32_to_cpu(buf->erased_cw);
+
+ /*
+ * an erased page flags an error in NAND_FLASH_STATUS, if there
+ * isn't any error, bail out early and report a non-erased
+ * page
+ */
+ if (!(flash & FS_OP_ERR))
+ break;
+
+ /*
+ * if BCH is enabled, HW will take care of detecting erased
+ * pages
+ */
+ if (host->bch_enabled) {
+ /* bail out if we didn't detect an erased CW */
+ if ((erased_cw & ERASED_CW) != ERASED_CW)
+ break;
+ } else {
+ /*
+ * if RS ECC is enabled, check if the CW is erased by
+ * looking for 0x54s at offsets 3 and 175
+ */
+ empty1 = data_buf[3 + i * host->cw_data];
+ empty2 = data_buf[175 + i * host->cw_data];
+
+ /* bail out if the CW isn't erased */
+ if (!(empty1 == 0x54 && empty2 == 0xff) &&
+ !(empty1 == 0xff && empty2 == 0x54))
+ break;
+ }
+ }
+
+ if (i < ecc->steps)
+ return false;
+
+ if (!host->bch_enabled) {
+ /*
+ * fix up the buffer by replacing the magic offsets with
+ * 0xff
+ */
+ for (i = 0; i < ecc->steps; i++) {
+ data_buf[3 + i * host->cw_data] = 0xff;
+ data_buf[175 + i * host->cw_data] = 0xff;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * reads back status registers set by the controller to notify page read
+ * errors. this is equivalent to what 'ecc->correct()' would do.
+ */
+static int parse_read_errors(struct qcom_nand_host *host, u8 *data_buf,
+ u8 *oob_buf, bool erased_page)
+{
+ struct nand_chip *chip = &host->chip;
+ struct qcom_nand_controller *nandc = get_qcom_nand_controller(chip);
+ struct mtd_info *mtd = nand_to_mtd(chip);
+ struct nand_ecc_ctrl *ecc = &chip->ecc;
+ unsigned int max_bitflips = 0;
+ struct read_stats *buf;
+ int i;
+
+ buf = (struct read_stats *)nandc->reg_read_buf;
+
+ for (i = 0; i < ecc->steps; i++, buf++) {
+ u32 flash, buffer;
+ int data_len, oob_len;
+
+ if (i == (ecc->steps - 1)) {
+ data_len = ecc->size - ((ecc->steps - 1) << 2);
+ oob_len = ecc->steps << 2;
+ } else {
+ data_len = host->cw_data;
+ oob_len = 0;
+ }
+
+ flash = le32_to_cpu(buf->flash);
+ buffer = le32_to_cpu(buf->buffer);
+
+ if (flash & (FS_OP_ERR | FS_MPU_ERR)) {
+ if (erased_page) {
+ int ret, ecclen, extraooblen;
+ void *eccbuf;
+
+ eccbuf = oob_buf ? oob_buf + oob_len : NULL;
+ ecclen = oob_buf ? host->ecc_bytes_hw : 0;
+ extraooblen = oob_buf ? oob_len : 0;
+
+ ret = nand_check_erased_ecc_chunk(data_buf,
+ data_len, eccbuf, ecclen, oob_buf,
+ extraooblen, ecc->strength);
IIUC, the erased_page info is returned by empty_page_fixup() and is
only set if the page is detected as empty (filled with ff).
If that's the case, then you don't have to use
nand_check_erased_ecc_chunk() to check it again...
empty_page_fixup now doesn't parse the entire page for 0xffs, it just
checks if the correct flags have been raised by the controller hardware,
and replaces the 'special offsets' with 0xffs instead of 0x54s.
But didn't you say that those pattern assignment were guaranteeing that
the tested chunk is empty? Or am I missing something else?
No, the controller reports 0x54s at special offsets, but we still need
to parse the entire buffer for 0xffs because the flash user might have
intentionally placed 0x54 in that offsets.
The previous revisions of the patchset first the changed the
0x54s to 0xffs at the special offsets, and then checked if the
entire page for 0xffs. If any single byte wasn't 0xff, it reported it
as not empty and replaced the offsets back with 0x54s. For the newer
IPs, we don't need to read the entirepage, we only need to read a
bitfield per chunk to be sure.
Oh, I thought the FS_OP_ERR + 0X54 pattern @3 and 175 were enough to
detect an empty page, but I must have misunderstood your previous
explanations. Anyway, adding an extra nand_check_erased_ecc_chunk()
here shouldn't hurt, so I'm fine with that one.
Yeah, we need to check manually too, sadly. Although, since we are sure
that it is always 0xffs, I can put the faster memchr_inv func to check
if the page is erased.
From what I understood, we still need to parse the chunks to try to
fix 'ecc->strength' number of bitflips.
No, if the controller tests and guarantees that a specific page is empty
(filled with ff), then we should trust it.
I suggested to use nand_check_erased_ecc_chunk() in one of my
previous review because I thought the 0x54 detection scheme was not
sufficient to detect empty pages, but you said it was, so I trust
you ;). And if it's really safe, then we don't need to check again with
nand_check_erased_ecc_chunk() here.
Okay. I thought that when NAND controllers report an empty page, there
can still be bitflips in them once we read it, and that we need to
use nand_check_erased_ecc_chunk to set those bits back to 1.
It's really dependent on your NAND controller, so I can't answer that
question for your specific case, but usually NAND controller are able
to detect pages filled with 0xff, but as soon as you have a single
bitflip, the control is passed to the ECC engine which tries to correct
errors (and fails to do it in most cases).
Yes, that seems to be the case for this controller.
Some ECC engines are smarter and you can pass them an 'acceptable'
number of bitflips that is used by the "erased page detection" logic.
I haven't seen such an 'acceptable' bitfield for this controller.
And other ECC engines take care of xoring the ECC bytes so that it
generates 0xff bytes for empty pages (this solution is the ideal one,
since you're guaranteed to fix bitflips even for the empty/erased page
case).
Okay, I doubt that the controller does that here, but I'll go through
the docs and verify. Thanks for the explanation.
+ if (ret < 0) {
+ mtd->ecc_stats.failed++;
+ } else {
+ mtd->ecc_stats.corrected += ret;
+ max_bitflips =
+ max_t(unsigned int, max_bitflips, ret);
+ }
+ } else {
+ if (buffer & BS_UNCORRECTABLE_BIT) {
... here is where you should check if what was detected as
uncorrectable errors is not in fact some bitflips in an erased page.
This path will never hit for a page reported as erased by the HW. The
'else' branch happens only for pages that were reported as 'not erased'
by empty page fixup.
In other words, the BS_UNCORRECTABLE_BIT register bitfield is never
checked for an erased page. In some experiments I performed, I noticed
that this bitfield was almost always set for an erased page. There is
not point in even checking this field for an erased page.
No, my point was that, if you have one or several bitflips in an erased
page (which can happen), then your NAND controller will first detect
that it's not an empty page (because you have some bits set to zero),
and then try to correct the errors with its ECC engine (which will
probably fail, unless your controller generate 0xff ECC bytes for an
empty page). nand_check_erased_ecc_chunk() has been created exactly for
this use case: manually check if a page is 'almost' empty when the ECC
engine fails to correct errors.
Okay. I think I understand now :). I thought nand_check_erased_ecc_chunk
had to be used for pages that were reported as erased, but it is used
for pages that aren't detected as erased because of some bitflips, and
we just make sure if it is erased or not.
Should the pseudo code look something like this?
/* ecc->read_page */
int qcom_nand_read_page(...)
{
/* read the page */
...
erased = empty_page_fixup();
/* we make sure above that the entire page is 0xffs or not */
return parse_read_errors(host, erased);
}
int parse_read_errors(host, erased)
{
for each chunk {
if (!erased) {
if (uncorrectable errors) {
ret = nand_check_erased_ecc_chunk();
if (ret < 0) {
/* not an erased page, report */
stats.failed++
} else {
/* 'almost' empty page which
HW couldn't detect as erased */
stats.corrected += ret;
}
} else {
stats.corrected += stat;
}
}
}
}
Yep, exactly.
Thanks. I'll fix this and test it out.
If you want to check that your "bitflips in erased pages" handling
is correct, you can try this tool to artificially flip some bits
[1].
# flash_erase /dev/mtdX Y 1
# nandflipbits /dev/mtdX 2@<Y>:3@<Y+1>
Thanks for the suggestion. I was thinking of running mtd_torturetest
on an eraseblock for a few days and hope for a few bitflips. This
seems much easier :)
Archit
Best Regards,
Boris
[1]http://lists.infradead.org/pipermail/linux-mtd/2014-November/056634.html
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html