Re: [PATCH] mtd: spinand: Add support for GigaDevice GD5F1GQ4UC

Boris Brezillon <bbrezillon@xxxxxxxxxx> · Wed, 23 Jan 2019 13:22:49 +0100

On Wed, 23 Jan 2019 12:37:57 +0100
Stefan Roese <sr@xxxxxxx> wrote:

> On 23.01.19 12:25, Boris Brezillon wrote:
> > On Wed, 23 Jan 2019 11:04:36 +0100
> > Stefan Roese <sr@xxxxxxx> wrote:
> >   
> >> On 23.01.19 10:35, Boris Brezillon wrote:  
> >>> On Wed, 23 Jan 2019 10:06:59 +0100
> >>> Stefan Roese <sr@xxxxxxx> wrote:
> >>>      
> >>>> On 23.01.19 09:55, Boris Brezillon wrote:  
> >>>>> On Wed, 23 Jan 2019 09:23:47 +0100
> >>>>> Stefan Roese <sr@xxxxxxx> wrote:
> >>>>>         
> >>>>>>> This one doesn't, incremental mode (-i) should.  
> >>>>>>
> >>>>>> Here you go:
> >>>>>>
> >>>>>> # ./nandbiterrs /dev/mtd5 -k -i
> >>>>>> incremental biterrors test
> >>>>>> Failed to recover 1 bitflips
> >>>>>> Read error after 0 bit errors per page
> >>>>>>
> >>>>>> I'm still unsure how this helps here.  
> >>>>>
> >>>>> It helps, it tells us the ECC doesn't work properly (fails to recover
> >>>>> one bitflip), or maybe it's the raw accessors that don't don't work.
> >>>>>         
> >>>>>> Is there anything else I should test?  
> >>>>>
> >>>>> Add traces to the get_ecc_status() func and print the status value.  
> >>>>
> >>>> # ./nandbiterrs /dev/mtd5 -k -i
> >>>> [   22.098436] gd5f1gq4u_ecc_get_status (124): status=0x00 status2=0x00
> >>>> [   22.117184] gd5f1gq4u_ecc_get_status (124): status=0x00 status2=0x00
> >>>>
> >>>> <snip many identical lines>
> >>>>
> >>>> [   23.085412] gd5f1gq4u_ecc_get_status (124): status=0x00 status2=0x00
> >>>> incremental biterrors test
> >>>> [   23.102973] gd5f1gq4u_ecc_get_status (124): status=0x20 status2=0x00
> >>>> Failed to recover 1 bitflips  
> >>>
> >>> Hm, looks like the ECC reports error as soon as you start writing to
> >>> the NAND. Maybe we have a problem in the write path...
> >>>      
> >>>> Read error after 0 bit errors per page
> >>>>
> >>>> Strange, this does not seem to match what the datasheet tells us. Any
> >>>> further ideas what I should test?  
> >>>
> >>> Erase a block (save data before if you need to), write random data with
> >>> the ECC enabled and dump it back (once in raw mode, once with ECC
> >>> enabled):
> >>>
> >>> # flash_erase /dev/mtdX 0 1
> >>> # nandwrite --input-size=<pagesize> /dev/mtdX /dev/urandom
> >>> # nanddump -f /tmp/dump-ecc -l <pagesize> -o /dev/mtdX
> >>> # nanddump -f /tmp/dump-raw -l <pagesize> -o -n /dev/mtdX
> >>>
> >>> Send me both dumps (plus the console output), and we'll see how it
> >>> looks.  
> >>
> >> Here you go:
> >>
> >> root@mt7688:~# flash_erase /dev/mtd5 0 1
> >> Erasing 128 Kibyte @ 0 -- 100 % complete
> >> root@mt7688:~# nandwrite --input-size=2048 /dev/mtd5 /dev/urandom
> >> Writing data to block 0 at offset 0x0
> >> root@mt7688:~# nanddump -f /tmp/dump-ecc -l 2048 -o /dev/mtd5
> >> ECC failed: 0
> >> ECC corrected:[  100.171120] gd5f1gq4u_ecc_get_status (124): status=0x00 status2=0x00
> >>    0
> >> Number of ba[  100.178436] gd5f1gq4u_ecc_get_status (124): status=0x00 status2=0x00
> >> d blocks: 2
> >> Number of bbt blocks: 0
> >> Block size 131072, page size 2048, OOB size 128
> >> Dumping data starting at 0x00000000 and ending at 0x00000800...
> >> root@mt7688:~# dmesg -c
> >> [  100.171120] gd5f1gq4u_ecc_get_status (124): status=0x00 status2=0x00
> >> [  100.178436] gd5f1gq4u_ecc_get_status (124): status=0x00 status2=0x00
> >> root@mt7688:~# nanddump -f /tmp/dump-raw -l 2048 -o -n /dev/mtd5
> >> Block size 131072, page size 2048, OOB size 128
> >> Dumping data starting at 0x00000000 and ending at 0x00000800...
> >> root@mt7688:~# dmesg -c
> >> root@mt7688:~#
> >>
> >> The attached files are identical. Thanks for looking into this.  
> > 
> > First weird thing, the first portion of OOB (bytes 0x800 to 0x83F) are
> > set to 0x0, and I'd expect to have 0xff in there. BTW, can you try
> > nandbiterrs again without the '-k'?  
> 
> Same result:
> 
> root@mt7688:~# ./nandbiterrs /dev/mtd5 -i
> incremental biterrors test
> [ 5748.988596] gd5f1gq4u_ecc_get_status (124): status=0x20 status2=0x00
> Failed to recover 1 bitflips
> Read error after 0 bit errors per page

Okay. There's something interesting in section "10.1 Page Program" of
the datasheet:

"
Note:
1. The contents of Cache Register doesn’t reset when Program Load (02h)
command, Program Random Load (84h)
command and RESET (FFh) command.
2. When Program Execute (10h) command was issued just after Program Load
(02h) command, SPI-NAND controller
outputs 0xFF data to the NAND for the address that data was not loaded
by Program Load (02h) command.
3. When Program Execute (10h) command was issued just after Program Load
Random Data (84h) command,
SPI-NAND controller outputs contents of Cache Register to the NAND.
"

Until now, I assumed that a "Program Load" would reset the page cache
content to 0xff (as is done on the NANDs I had tested on), but it seems
some vendors decided to implement it differently (keep the cache in its
previous state and send 0xff at execute time if the previous command
was a Program Load and some bytes were left uninitialized in the cache).

This forces us to fill the whole cache if we want the logic to work on
all NANDs otherwise we might corrupt things in the OOB area. It might
also explain while nandbiterrs does not work properly. Can you try to
apply the following diff and run nandbiterrs -i again?

> 
> And from your other mail:
> 
> > BTW, which version of the mtd-utils are you using?  
> 
> I'm currently using the one provided with my Yocto build:
> 
> root@mt7688:~# mtdinfo --version
> mtdinfo (mtd-utils) 2.0.1
> 
> I hope that is recent enough.

Should be good.

--->8---

diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index 479c2f2cf17f..10c92cc48428 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -313,15 +313,9 @@ static int spinand_write_to_cache_op(struct spinand_device *spinand,
               nanddev_page_size(nand) +
               nanddev_per_page_oobsize(nand));
 
-       if (req->datalen) {
+       if (req->datalen)
                memcpy(spinand->databuf + req->dataoffs, req->databuf.out,
                       req->datalen);
-               adjreq.dataoffs = 0;
-               adjreq.datalen = nanddev_page_size(nand);
-               adjreq.databuf.out = spinand->databuf;
-               nbytes = adjreq.datalen;
-               buf = spinand->databuf;
-       }
 
        if (req->ooblen) {
                if (req->mode == MTD_OPS_AUTO_OOB)
@@ -332,16 +326,23 @@ static int spinand_write_to_cache_op(struct spinand_device *spinand,
                else
                        memcpy(spinand->oobbuf + req->ooboffs, req->oobbuf.out,
                               req->ooblen);
-
-               adjreq.ooblen = nanddev_per_page_oobsize(nand);
-               adjreq.ooboffs = 0;
-               nbytes += nanddev_per_page_oobsize(nand);
-               if (!buf) {
-                       buf = spinand->oobbuf;
-                       column = nanddev_page_size(nand);
-               }
        }
 
+       /*
+        * Looks like PROGRAM LOAD (AKA write cache) does not necessarily reset
+        * the cache content to 0xFF (depends on vendor implementation), so we
+        * must fill the page cache entirely even if we only want to program
+        * the data portion of the page, otherwise we might corrupt the BBM or
+        * user data previously programmed in OOB area.
+        */
+       adjreq.dataoffs = 0;
+       adjreq.datalen = nanddev_page_size(nand);
+       adjreq.databuf.out = spinand->databuf;
+       adjreq.ooblen = nanddev_per_page_oobsize(nand);
+       adjreq.ooboffs = 0;
+       nbytes = nanddev_page_size(nand) + nanddev_per_page_oobsize(nand);
+       buf = spinand->databuf;
+
        spinand_cache_op_adjust_colum(spinand, &adjreq, &column);
 
        op = *spinand->op_templates.write_cache;

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/