Re: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change do_write_oneword() to use chip_good()

Joakim Tjernlund <Joakim.Tjernlund@xxxxxxxxxxxx> · Wed, 23 Jan 2019 12:13:06 +0000

On Wed, 2019-01-23 at 20:56 +0900, Tokunori Ikegami wrote:
> 
> 
> Hi Jocke-san,
> 
> Thanks for your advice.
> 
> To make sure let me confirm below.
> 
> The OpenWrt code includes your patch below.
> 
>   f69cd2d30a80 2018-05-01 12:58:18 -0700 mtd: cfi: cmdset_0002: Do not allow read/write to suspend erase block.  [Joakim Tjernlund]
> 
> Do you mean that it is possible to be needed an additional change more based on this?

That patch resolves another problem. I have not sent a patch for problem I mentioned in this mail. 

> Or is it not related to the patch fixed by you?
>   Note: Sorry now I am not able to check the patches to try sent by you before.

NP
   Jocke

> 
> > I have lost track of all the details regarding this issue. I just want to
> > add:
> > 
> > There is a max number of suspend/resume cycles one can do during an
> > erase(possibly also for write)
> > and once that number is hit you get an error. One way to avoid this could
> > be to
> > wait after each resume until the erase has started again(look in status
> > register)
> > before continuing.
> > 
> > If this has anything to do with this problem, I do not know.
> > 
> >  Jocke
> 
> Regards,
> Ikegami
> 
> > -----Original Message-----
> > From: Joakim Tjernlund [mailto:Joakim.Tjernlund@xxxxxxxxxxxx]
> > Sent: Wednesday, January 23, 2019 1:58 AM
> > To: boris.brezillon@xxxxxxxxxxx; ikegami_to@xxxxxxxxxxx
> > Cc: linux-mtd@xxxxxxxxxxxxxxxxxxx; chris.packham@xxxxxxxxxxxxxxxxxxx;
> > fbettoni@xxxxxxxxx; nbd@xxxxxxxx; stable@xxxxxxxxxxxxxxx;
> > hauke@xxxxxxxxxx; koen.vandeputte@xxxxxxxxxxxx;
> > boris.brezillon@xxxxxxxxxxxxxxxxxx
> > Subject: Re: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> > do_write_oneword() to use chip_good()
> > 
> > On Wed, 2019-01-23 at 00:49 +0900, Tokunori Ikegami wrote:
> > > 
> > > Hi Boris-san,
> > > 
> > > Very sorry for too late to update about this.
> > > But could you please let me consult below about this patch?
> > > 
> > > I have tried to investigate the issue root cause and confirmed below but
> > > still the root cause seems not clear.
> > > 
> > >   1. Without the change the write oneword is retried and recovered by
> > the
> > > current existing chip_good() checking.
> > >      But after the 1,001 times recovery it was continued to fail recovery
> > > with the 3 times retry.
> > 
> > I have lost track of all the details regarding this issue. I just want to
> > add:
> > 
> > There is a max number of suspend/resume cycles one can do during an
> > erase(possibly also for write)
> > and once that number is hit you get an error. One way to avoid this could
> > be to
> > wait after each resume until the erase has started again(look in status
> > register)
> > before continuing.
> > 
> > If this has anything to do with this problem, I do not know.
> > 
> >  Jocke
> > 
> > >   2. By the patch change the recovery failure can be avoided and the write
> > > oneword works correctly without any failure.
> > >      There are different from the original chip_good() checking as the
> > > original code resets the chip before the retry.
> > >      The patch change wait the chip_good() status until the timeout expiry
> > > without the chip reset.
> > >        Note: There is a different from the original OpenWrt patch and
> > needed
> > > to be changed as same and it will be done by the next v4 patch.
> > > 
> > >   3. To narrow down the cause I have added some delays into the original
> > > code to check the chip ready and good status.
> > >      But the failure behavior was not changed so it seems that the issue
> > is
> > > not depended to the timing. (But not sure)
> > > 
> > >   4. On the OpenWrt the write buffer is disabled but to narrow down the
> > > issue I have changed to enable the write buffer.
> > >      Then the flash error was not happened by the write buffer operation
> > so
> > > it seems that the flash driver works correctly.
> > >      But another issue was caused and it is similar issue with the original
> > > OpenWrt behavior with the patch change.
> > >        Note: On the original OpenWrt needs to wait the file system
> > > completion to build but it was not finished with the write buffer. (But
> > not
> > > sure about this behavior)
> > > 
> > > Do you have any comment about this result?
> > > 
> > > If you can agree about the patch change basically with the current
> > situation
> > > I will do send the v4 patch set later with fix for the comments.
> > > 
> > > But it seems that it is difficult to investigate the root cause more at
> > this
> > > moment to me.
> > > Since but the behavior looks depended on the flash chip hardware behavior
> > > and I cannot debug the hardware behavior more I think.
> > >   Note: Now I can reproduce the flash error issue behavior on the OpenWrt
> > > unit.
> > > 
> > > > > >     It is depended on the actual flash chip behavior so the root
> > cause
> > > > > is
> > > > > >     unknown.
> > > > > 
> > > > > Yes, and that's what I'd like you to figure out, or at least have
> > a
> > > > > good idea why this doesn't work on some chips but works on others.
> > > > 
> > > > I see.
> > > > But it is a little bit difficult situation since I do not have the failure
> > > > environment locally at this moment.
> > > > But if needed I may ask to get the help for this to Fabio-san.
> > > 
> > > Regards,
> > > Ikegami
> > > 
> > > > -----Original Message-----
> > > > From: IKEGAMI Tokunori [mailto:ikegami@xxxxxxxxxxxxxxxxxxxx]
> > > > Sent: Tuesday, November 6, 2018 6:47 PM
> > > > To: Boris Brezillon; ikegami_to@xxxxxxxxxxx
> > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke Mehrtens;
> > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris;
> > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni
> > > > Subject: RE: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> > > > do_write_oneword() to use chip_good()
> > > > 
> > > > Sorry let me resend the mail below by changing the email address of
> > > > Felix-san.
> > > > 
> > > > -----Original Message-----
> > > > From: IKEGAMI Tokunori
> > > > Sent: Tuesday, November 06, 2018 6:37 PM
> > > > To: 'Boris Brezillon'; 'ikegami_to@xxxxxxxxxxx'
> > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke Mehrtens;
> > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris;
> > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni
> > > > Subject: RE: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> > > > do_write_oneword() to use chip_good()
> > > > 
> > > > Hi Boris-san,
> > > > 
> > > > > -----Original Message-----
> > > > > From: stable-owner@xxxxxxxxxxxxxxx
> > > > > [mailto:stable-owner@xxxxxxxxxxxxxxx] On Behalf Of Boris Brezillon
> > > > > Sent: Tuesday, November 06, 2018 5:34 PM
> > > > > To: IKEGAMI Tokunori
> > > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke
> > Mehrtens;
> > > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris;
> > > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni
> > > > > Subject: Re: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> > > > > do_write_oneword() to use chip_good()
> > > > > 
> > > > > Hi IKEGAMI,
> > > > > 
> > > > > On Tue, 6 Nov 2018 00:25:43 +0000
> > > > > IKEGAMI Tokunori <ikegami@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > > > > Also the issue can be fixed by using chip_good() instead of
> > > > > chip_ready().
> > > > > > > > The chip_ready() just checks the value from flash memory twice.
> > > > > > > > And the chip_good() checks the value with the expected value.
> > > > > > > > Probably the issue can be fixed as checked correctly by the
> > > > chip_good().
> > > > > > > > So change to use chip_good() instead of chip_ready().
> > > > > > > 
> > > > > > > Well, that's not really explaining why you think chip_good() should
> > > > > be
> > > > > > > used instead of chip_ready(). So I went on and looked at the
> > > > > > > chip_good(), chip_ready() and do_write_oneword() implementation,
> > and
> > > > > > > also looked at users of do_write_oneword(). It seems this function
> > > > is
> > > > > > > used to write data to the flash, and apparently the "one bit should
> > > > > > > toggle to reflect a busy state" does not apply when writing things
> > > > to
> > > > > > > the memory array (it's probably working for other CFI commands,
> > but
> > > > > I
> > > > > > > guess it takes more time to actually change the level of a NOR
> > cell,
> > > > > > > hence the result of 2 identical reads does not mean that the write
> > > > is
> > > > > > > done).
> > > > > > > 
> > > > > > > Also, it seems that cmdset_0001 is not implementing chip_ready()
> > the
> > > > > > > same way, and I wonder if cmdset_0002 implementation is correct
> > to
> > > > > > > start with. Or maybe I don't get what chip_ready() is for.
> > > > > > > 
> > > > > > > Anyway, this is the sort of clarification I'd like to have.
> > > > > > 
> > > > > > I am thinking to update the commit message as below.
> > > > > > 
> > > > > >     mtd: cfi_cmdset_0002: Use chip_good() to retry in
> > > > do_write_oneword()
> > > > > >     As reported by the OpenWRT team, write requests sometimes fail
> > on
> > > > > some
> > > > > >     platforms.
> > > > > >     Currently to check the state chip_ready() is used correctly
> > as
> > > > > described by
> > > > > >     the flash memory S29GL256P11TFI01 datasheet.
> > > > > 
> > > > > I had a look at the S29GL256P datasheet here [1], and if I'm correct,
> > > > > it's using cmdset 0001.
> > > > 
> > > > No actually the cmdset 0002 is used on the flash chip.
> > > > The manufacturer ID xx01h and Device ID 2201h are used to decide.
> > > > 
> > > > There is information from Fobis-san below also about this.
> > > > 
> > > > On forum thread musashino posted picture of flash chip:
> > > > 
> > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fforum
> > .openwrt.org%2Ft%2Fimpossible-to-install-update-any-packages-&amp;data
> > =02%7C01%7Cjoakim.tjernlund%40infinera.com%7C916af968b27a402da8cf08d68
> > 0812fce%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C63683768968012655
> > 7&amp;sdata=NNGSYgq1VTuofPPMMlyKIm9W1DJHQFw0s94Ernq5cts%3D&amp;reserve
> > d=0
> > > > on-wzr-hp-g300nh-18-06-1
> > > > 
> > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cy
> > press.com%2Fpart%2Fs29gl256p11tfi010&amp;data=02%7C01%7Cjoakim.tjernlu
> > nd%40infinera.com%7C916af968b27a402da8cf08d680812fce%7C285643de5f5b4b0
> > 3a1530ae2dc8aaf77%7C1%7C1%7C636837689680126557&amp;sdata=Twk1VUEESz14U
> > pdJjU4ohuhiQ5jN1uHLh0cAhlAznW0%3D&amp;reserved=0
> > > > [    0.862264] physmap platform flash device: 02000000 at 1e000000
> > > > [    0.868331] physmap-flash: Found 1 x16 devices at 0x0 in 16-bit
> > > > bank. Manufacturer ID 0x000001 Chip ID 0x002201
> > > > [    0.878493] Amd/Fujitsu Extended Query Table at 0x0040
> > > > [    0.883668]   Amd/Fujitsu Extended Query version 1.3.
> > > > [    0.888768] number of CFI chips: 1
> > > > [    0.894557] Searching for RedBoot partition table in physmap-flash
> > > > at offset 0x1fc0000
> > > > [    0.918009] Searching for RedBoot partition table in physmap-flash
> > > > at offset 0x1fe0000
> > > > [    0.941464] No RedBoot partition table detected in physmap-flash
> > > > [    0.947926] Creating 5 MTD partitions on "physmap-flash":
> > > > [    0.953384] 0x000000000000-0x000000040000 : "u-boot"
> > > > [    0.960853] 0x000000040000-0x000000060000 : "u-boot-env"
> > > > [    0.968803] 0x000000060000-0x000001fc0000 : "firmware"
> > > > [    0.981859] 2 uimage-fw partitions found on MTD device firmware
> > > > [    0.987900] 0x000000060000-0x0000001b5706 : "kernel"
> > > > [    0.994916] 0x0000001b5706-0x000001fc0000 : "rootfs"
> > > > [    1.001986] mtd: device 4 (rootfs) set to be root filesystem
> > > > [    1.007789] 1 squashfs-split partitions found on MTD device rootfs
> > > > [    1.014014] 0x0000003c0000-0x000001fc0000 : "rootfs_data"
> > > > [    1.022093] 0x000001fc0000-0x000001fe0000 : "user_property"
> > > > [    1.030283] 0x000001fe0000-0x000002000000 : "art"
> > > > 
> > > > Maybe you could post links to forum thread, and data sheet.
> > > > 
> > > > > >     Also chip_good() is used to check if the write is succeeded
> > and
> > > > it
> > > > > was
> > > > > >     implemented by the commit fb4a90bfcd6d8 ("[MTD] CFI-0002 -
> > Improve
> > > > > error
> > > > > >     checking").
> > > > > >     But actually the write failure is caused on some platforms and
> > > also
> > > > > it can
> > > > > >     be fixed by using chip_good() to check the state and retry
> > > instead.
> > > > > Do you know on which NOR chips this happens? Do you have access to
> > the
> > > > > datasheet?
> > > > 
> > > > But it looks SST49LF008A [3] from the changes below but I am not sure
> > at
> > > > this moment and probably it should be confirmed to the authr Eric W.
> > > > Biedermann <ebiederman@xxxxxxxx> to make sure.
> > > > 
> > > > +#define SST49LF008A            0x005a
> > > > 
> > > >  static int cfi_amdstd_read (struct mtd_info *, loff_t, size_t, size_t
> > *,
> > > > u_char *);
> > > >  static int cfi_amdstd_write_words(struct mtd_info *, loff_t, size_t,
> > > > size_t *, const u_char *);
> > > > @@ -191,6 +192,7 @@ static struct cfi_fixup cfi_fixup_table[] = {
> > > >  };
> > > >  static struct cfi_fixup jedec_fixup_table[] = {
> > > >         { MANUFACTURER_SST, SST49LF004B, fixup_use_fwh_lock, NULL, },
> > > > +       { MANUFACTURER_SST, SST49LF008A, fixup_use_fwh_lock, NULL, },
> > > > 
> > > > > >     It is depended on the actual flash chip behavior so the root
> > cause
> > > > > is
> > > > > >     unknown.
> > > > > 
> > > > > Yes, and that's what I'd like you to figure out, or at least have
> > a
> > > > > good idea why this doesn't work on some chips but works on others.
> > > > 
> > > > I see.
> > > > But it is a little bit difficult situation since I do not have the failure
> > > > environment locally at this moment.
> > > > But if needed I may ask to get the help for this to Fabio-san.
> > > > 
> > > > > > If any comment please let me know.
> > > > > > 
> > > > > > > > Signed-off-by: Tokunori Ikegami <ikegami@xxxxxxxxxxxxxxxxxxxx>
> > > > > > > > Signed-off-by: Hauke Mehrtens <hauke@xxxxxxxxxx>
> > > > > > > > Signed-off-by: Koen Vandeputte <koen.vandeputte@xxxxxxxxxxxx>
> > > > > > > > Signed-off-by: Fabio Bettoni <fbettoni@xxxxxxxxx>
> > > > > > > 
> > > > > > > Has the patch really gone through all those people? SoB is used
> > when
> > > > > you
> > > > > > > apply a patch in your tree or when you're the original author.
> > > > > > 
> > > > > > I have just checked the OpenWRT git log again and it looks that
> > it was
> > > > > originally
> > > > > > implemented by Felix Fietkau <nbd@xxxxxxxxxxx> by the patch below
> > so
> > > > I
> > > > > will update the Signed-off-by tag as so.
> > <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> > openwrt.org%2F%3Fp%3Dopenwrt%2Fopenwrt.git%3Ba%3Dcommitdiff%3Bh%3D2530
> > 640&amp;data=02%7C01%7Cjoakim.tjernlund%40infinera.com%7C916af968b27a4
> > 02da8cf08d680812fce%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C63683
> > 7689680126557&amp;sdata=w13ZTKwD1NiUQzxQfUou92KVDlW80qGUiZVIcjU%2BGPA%
> > 3D&amp;reserved=0
> > > > > f07cd2b3b14fe9ec03fa63a586452cc5f>
> > > > > > > > Co-Developed-by: Hauke Mehrtens <hauke@xxxxxxxxxx>
> > > > > > > > Co-Developed-by: Koen Vandeputte <koen.vandeputte@xxxxxxxxxxxx>
> > > > > > > > Co-Developed-by: Fabio Bettoni <fbettoni@xxxxxxxxx>
> > > > > > > 
> > > > > > > Not sure we want to add new undocumented tags, but you can mention
> > > > > > > that all those people helped you find/debug the issue. They can
> > also
> > > > > > > add their Reviewed-by/Tested-by if they like.
> > > > > 
> > > > > My bad, I just noticed these are valid flags [2], so you can keep
> > them,
> > > > > and according to the doc, you should also keep the SoB.
> > > > 
> > > > I see.
> > > > Yes I had also checked it.
> > > > 
> > > > By the way in near future my company email address will be not able
> > to
> > > use.
> > > > So I will change the mail address to my personal email address [4] after
> > > > that or before.
> > > > 
> > > > Regards,
> > > > Ikegami
> > > > 
> > > > > Regards,
> > > > > 
> > > > > Boris
> > > > > 
> > > > > 
>