RE: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change do_write_oneword() to use chip_good()

"Tokunori Ikegami" <ikegami_to@xxxxxxxxxxx> · Wed, 23 Jan 2019 21:34:02 +0900

Thanks for your quick response.
Noted it so I will try to confirm your advice.

Regards,
Ikegami

> -----Original Message-----
> From: Joakim Tjernlund [mailto:Joakim.Tjernlund@xxxxxxxxxxxx]
> Sent: Wednesday, January 23, 2019 9:13 PM
> To: boris.brezillon@xxxxxxxxxxx; ikegami_to@xxxxxxxxxxx
> Cc: linux-mtd@xxxxxxxxxxxxxxxxxxx; chris.packham@xxxxxxxxxxxxxxxxxxx;
> fbettoni@xxxxxxxxx; nbd@xxxxxxxx; stable@xxxxxxxxxxxxxxx;
> hauke@xxxxxxxxxx; koen.vandeputte@xxxxxxxxxxxx;
> boris.brezillon@xxxxxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> do_write_oneword() to use chip_good()
> 
> On Wed, 2019-01-23 at 20:56 +0900, Tokunori Ikegami wrote:
> >
> >
> > Hi Jocke-san,
> >
> > Thanks for your advice.
> >
> > To make sure let me confirm below.
> >
> > The OpenWrt code includes your patch below.
> >
> >   f69cd2d30a80 2018-05-01 12:58:18 -0700 mtd: cfi: cmdset_0002: Do not
> allow read/write to suspend erase block.  [Joakim Tjernlund]
> >
> > Do you mean that it is possible to be needed an additional change more
> based on this?
> 
> That patch resolves another problem. I have not sent a patch for problem
> I mentioned in this mail.
> 
> > Or is it not related to the patch fixed by you?
> >   Note: Sorry now I am not able to check the patches to try sent by you
> before.
> 
> NP
>    Jocke
> 
> >
> > > I have lost track of all the details regarding this issue. I just want
> to
> > > add:
> > >
> > > There is a max number of suspend/resume cycles one can do during an
> > > erase(possibly also for write)
> > > and once that number is hit you get an error. One way to avoid this
> could
> > > be to
> > > wait after each resume until the erase has started again(look in status
> > > register)
> > > before continuing.
> > >
> > > If this has anything to do with this problem, I do not know.
> > >
> > >  Jocke
> >
> > Regards,
> > Ikegami
> >
> > > -----Original Message-----
> > > From: Joakim Tjernlund [mailto:Joakim.Tjernlund@xxxxxxxxxxxx]
> > > Sent: Wednesday, January 23, 2019 1:58 AM
> > > To: boris.brezillon@xxxxxxxxxxx; ikegami_to@xxxxxxxxxxx
> > > Cc: linux-mtd@xxxxxxxxxxxxxxxxxxx;
> chris.packham@xxxxxxxxxxxxxxxxxxx;
> > > fbettoni@xxxxxxxxx; nbd@xxxxxxxx; stable@xxxxxxxxxxxxxxx;
> > > hauke@xxxxxxxxxx; koen.vandeputte@xxxxxxxxxxxx;
> > > boris.brezillon@xxxxxxxxxxxxxxxxxx
> > > Subject: Re: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> > > do_write_oneword() to use chip_good()
> > >
> > > On Wed, 2019-01-23 at 00:49 +0900, Tokunori Ikegami wrote:
> > > >
> > > > Hi Boris-san,
> > > >
> > > > Very sorry for too late to update about this.
> > > > But could you please let me consult below about this patch?
> > > >
> > > > I have tried to investigate the issue root cause and confirmed below
> but
> > > > still the root cause seems not clear.
> > > >
> > > >   1. Without the change the write oneword is retried and recovered
> by
> > > the
> > > > current existing chip_good() checking.
> > > >      But after the 1,001 times recovery it was continued to fail
> recovery
> > > > with the 3 times retry.
> > >
> > > I have lost track of all the details regarding this issue. I just want
> to
> > > add:
> > >
> > > There is a max number of suspend/resume cycles one can do during an
> > > erase(possibly also for write)
> > > and once that number is hit you get an error. One way to avoid this
> could
> > > be to
> > > wait after each resume until the erase has started again(look in status
> > > register)
> > > before continuing.
> > >
> > > If this has anything to do with this problem, I do not know.
> > >
> > >  Jocke
> > >
> > > >   2. By the patch change the recovery failure can be avoided and the
> write
> > > > oneword works correctly without any failure.
> > > >      There are different from the original chip_good() checking as
> the
> > > > original code resets the chip before the retry.
> > > >      The patch change wait the chip_good() status until the timeout
> expiry
> > > > without the chip reset.
> > > >        Note: There is a different from the original OpenWrt patch
> and
> > > needed
> > > > to be changed as same and it will be done by the next v4 patch.
> > > >
> > > >   3. To narrow down the cause I have added some delays into the original
> > > > code to check the chip ready and good status.
> > > >      But the failure behavior was not changed so it seems that the
> issue
> > > is
> > > > not depended to the timing. (But not sure)
> > > >
> > > >   4. On the OpenWrt the write buffer is disabled but to narrow down
> the
> > > > issue I have changed to enable the write buffer.
> > > >      Then the flash error was not happened by the write buffer
> operation
> > > so
> > > > it seems that the flash driver works correctly.
> > > >      But another issue was caused and it is similar issue with the
> original
> > > > OpenWrt behavior with the patch change.
> > > >        Note: On the original OpenWrt needs to wait the file system
> > > > completion to build but it was not finished with the write buffer.
> (But
> > > not
> > > > sure about this behavior)
> > > >
> > > > Do you have any comment about this result?
> > > >
> > > > If you can agree about the patch change basically with the current
> > > situation
> > > > I will do send the v4 patch set later with fix for the comments.
> > > >
> > > > But it seems that it is difficult to investigate the root cause more
> at
> > > this
> > > > moment to me.
> > > > Since but the behavior looks depended on the flash chip hardware
> behavior
> > > > and I cannot debug the hardware behavior more I think.
> > > >   Note: Now I can reproduce the flash error issue behavior on the
> OpenWrt
> > > > unit.
> > > >
> > > > > > >     It is depended on the actual flash chip behavior so the
> root
> > > cause
> > > > > > is
> > > > > > >     unknown.
> > > > > >
> > > > > > Yes, and that's what I'd like you to figure out, or at least have
> > > a
> > > > > > good idea why this doesn't work on some chips but works on others.
> > > > >
> > > > > I see.
> > > > > But it is a little bit difficult situation since I do not have the
> failure
> > > > > environment locally at this moment.
> > > > > But if needed I may ask to get the help for this to Fabio-san.
> > > >
> > > > Regards,
> > > > Ikegami
> > > >
> > > > > -----Original Message-----
> > > > > From: IKEGAMI Tokunori [mailto:ikegami@xxxxxxxxxxxxxxxxxxxx]
> > > > > Sent: Tuesday, November 6, 2018 6:47 PM
> > > > > To: Boris Brezillon; ikegami_to@xxxxxxxxxxx
> > > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke
> Mehrtens;
> > > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris;
> > > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni
> > > > > Subject: RE: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> > > > > do_write_oneword() to use chip_good()
> > > > >
> > > > > Sorry let me resend the mail below by changing the email address
> of
> > > > > Felix-san.
> > > > >
> > > > > -----Original Message-----
> > > > > From: IKEGAMI Tokunori
> > > > > Sent: Tuesday, November 06, 2018 6:37 PM
> > > > > To: 'Boris Brezillon'; 'ikegami_to@xxxxxxxxxxx'
> > > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke
> Mehrtens;
> > > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris;
> > > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni
> > > > > Subject: RE: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> > > > > do_write_oneword() to use chip_good()
> > > > >
> > > > > Hi Boris-san,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: stable-owner@xxxxxxxxxxxxxxx
> > > > > > [mailto:stable-owner@xxxxxxxxxxxxxxx] On Behalf Of Boris Brezillon
> > > > > > Sent: Tuesday, November 06, 2018 5:34 PM
> > > > > > To: IKEGAMI Tokunori
> > > > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke
> > > Mehrtens;
> > > > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris;
> > > > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni
> > > > > > Subject: Re: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change
> > > > > > do_write_oneword() to use chip_good()
> > > > > >
> > > > > > Hi IKEGAMI,
> > > > > >
> > > > > > On Tue, 6 Nov 2018 00:25:43 +0000
> > > > > > IKEGAMI Tokunori <ikegami@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > > > > Also the issue can be fixed by using chip_good() instead
> of
> > > > > > chip_ready().
> > > > > > > > > The chip_ready() just checks the value from flash memory
> twice.
> > > > > > > > > And the chip_good() checks the value with the expected value.
> > > > > > > > > Probably the issue can be fixed as checked correctly by
> the
> > > > > chip_good().
> > > > > > > > > So change to use chip_good() instead of chip_ready().
> > > > > > > >
> > > > > > > > Well, that's not really explaining why you think chip_good()
> should
> > > > > > be
> > > > > > > > used instead of chip_ready(). So I went on and looked at the
> > > > > > > > chip_good(), chip_ready() and do_write_oneword()
> implementation,
> > > and
> > > > > > > > also looked at users of do_write_oneword(). It seems this
> function
> > > > > is
> > > > > > > > used to write data to the flash, and apparently the "one bit
> should
> > > > > > > > toggle to reflect a busy state" does not apply when writing
> things
> > > > > to
> > > > > > > > the memory array (it's probably working for other CFI commands,
> > > but
> > > > > > I
> > > > > > > > guess it takes more time to actually change the level of a
> NOR
> > > cell,
> > > > > > > > hence the result of 2 identical reads does not mean that the
> write
> > > > > is
> > > > > > > > done).
> > > > > > > >
> > > > > > > > Also, it seems that cmdset_0001 is not implementing chip_ready()
> > > the
> > > > > > > > same way, and I wonder if cmdset_0002 implementation is correct
> > > to
> > > > > > > > start with. Or maybe I don't get what chip_ready() is for.
> > > > > > > >
> > > > > > > > Anyway, this is the sort of clarification I'd like to have.
> > > > > > >
> > > > > > > I am thinking to update the commit message as below.
> > > > > > >
> > > > > > >     mtd: cfi_cmdset_0002: Use chip_good() to retry in
> > > > > do_write_oneword()
> > > > > > >     As reported by the OpenWRT team, write requests sometimes
> fail
> > > on
> > > > > > some
> > > > > > >     platforms.
> > > > > > >     Currently to check the state chip_ready() is used correctly
> > > as
> > > > > > described by
> > > > > > >     the flash memory S29GL256P11TFI01 datasheet.
> > > > > >
> > > > > > I had a look at the S29GL256P datasheet here [1], and if I'm correct,
> > > > > > it's using cmdset 0001.
> > > > >
> > > > > No actually the cmdset 0002 is used on the flash chip.
> > > > > The manufacturer ID xx01h and Device ID 2201h are used to decide.
> > > > >
> > > > > There is information from Fobis-san below also about this.
> > > > >
> > > > > On forum thread musashino posted picture of flash chip:
> > > > >
> > >
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fforum
> >
> > .openwrt.org%2Ft%2Fimpossible-to-install-update-any-packages-&amp;da
> ta
> > >
> =02%7C01%7Cjoakim.tjernlund%40infinera.com%7C916af968b27a402da8cf08d68
> > >
> 0812fce%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C63683768968012655
> > >
> 7&amp;sdata=NNGSYgq1VTuofPPMMlyKIm9W1DJHQFw0s94Ernq5cts%3D&amp;reserve
> > > d=0
> > > > > on-wzr-hp-g300nh-18-06-1
> > > > >
> > >
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cy
> > >
> press.com%2Fpart%2Fs29gl256p11tfi010&amp;data=02%7C01%7Cjoakim.tjernlu
> > >
> nd%40infinera.com%7C916af968b27a402da8cf08d680812fce%7C285643de5f5b4b0
> > >
> 3a1530ae2dc8aaf77%7C1%7C1%7C636837689680126557&amp;sdata=Twk1VUEESz14U
> > > pdJjU4ohuhiQ5jN1uHLh0cAhlAznW0%3D&amp;reserved=0
> > > > > [    0.862264] physmap platform flash device: 02000000 at 1e000000
> > > > > [    0.868331] physmap-flash: Found 1 x16 devices at 0x0 in 16-bit
> > > > > bank. Manufacturer ID 0x000001 Chip ID 0x002201
> > > > > [    0.878493] Amd/Fujitsu Extended Query Table at 0x0040
> > > > > [    0.883668]   Amd/Fujitsu Extended Query version 1.3.
> > > > > [    0.888768] number of CFI chips: 1
> > > > > [    0.894557] Searching for RedBoot partition table in
> physmap-flash
> > > > > at offset 0x1fc0000
> > > > > [    0.918009] Searching for RedBoot partition table in
> physmap-flash
> > > > > at offset 0x1fe0000
> > > > > [    0.941464] No RedBoot partition table detected in physmap-flash
> > > > > [    0.947926] Creating 5 MTD partitions on "physmap-flash":
> > > > > [    0.953384] 0x000000000000-0x000000040000 : "u-boot"
> > > > > [    0.960853] 0x000000040000-0x000000060000 : "u-boot-env"
> > > > > [    0.968803] 0x000000060000-0x000001fc0000 : "firmware"
> > > > > [    0.981859] 2 uimage-fw partitions found on MTD device firmware
> > > > > [    0.987900] 0x000000060000-0x0000001b5706 : "kernel"
> > > > > [    0.994916] 0x0000001b5706-0x000001fc0000 : "rootfs"
> > > > > [    1.001986] mtd: device 4 (rootfs) set to be root filesystem
> > > > > [    1.007789] 1 squashfs-split partitions found on MTD device rootfs
> > > > > [    1.014014] 0x0000003c0000-0x000001fc0000 : "rootfs_data"
> > > > > [    1.022093] 0x000001fc0000-0x000001fe0000 : "user_property"
> > > > > [    1.030283] 0x000001fe0000-0x000002000000 : "art"
> > > > >
> > > > > Maybe you could post links to forum thread, and data sheet.
> > > > >
> > > > > > >     Also chip_good() is used to check if the write is succeeded
> > > and
> > > > > it
> > > > > > was
> > > > > > >     implemented by the commit fb4a90bfcd6d8 ("[MTD] CFI-0002
> -
> > > Improve
> > > > > > error
> > > > > > >     checking").
> > > > > > >     But actually the write failure is caused on some platforms
> and
> > > > also
> > > > > > it can
> > > > > > >     be fixed by using chip_good() to check the state and retry
> > > > instead.
> > > > > > Do you know on which NOR chips this happens? Do you have access
> to
> > > the
> > > > > > datasheet?
> > > > >
> > > > > But it looks SST49LF008A [3] from the changes below but I am not
> sure
> > > at
> > > > > this moment and probably it should be confirmed to the authr Eric
> W.
> > > > > Biedermann <ebiederman@xxxxxxxx> to make sure.
> > > > >
> > > > > +#define SST49LF008A            0x005a
> > > > >
> > > > >  static int cfi_amdstd_read (struct mtd_info *, loff_t, size_t,
> size_t
> > > *,
> > > > > u_char *);
> > > > >  static int cfi_amdstd_write_words(struct mtd_info *, loff_t,
> size_t,
> > > > > size_t *, const u_char *);
> > > > > @@ -191,6 +192,7 @@ static struct cfi_fixup cfi_fixup_table[] =
> {
> > > > >  };
> > > > >  static struct cfi_fixup jedec_fixup_table[] = {
> > > > >         { MANUFACTURER_SST, SST49LF004B, fixup_use_fwh_lock,
> NULL, },
> > > > > +       { MANUFACTURER_SST, SST49LF008A, fixup_use_fwh_lock,
> NULL, },
> > > > >
> > > > > > >     It is depended on the actual flash chip behavior so the
> root
> > > cause
> > > > > > is
> > > > > > >     unknown.
> > > > > >
> > > > > > Yes, and that's what I'd like you to figure out, or at least have
> > > a
> > > > > > good idea why this doesn't work on some chips but works on others.
> > > > >
> > > > > I see.
> > > > > But it is a little bit difficult situation since I do not have the
> failure
> > > > > environment locally at this moment.
> > > > > But if needed I may ask to get the help for this to Fabio-san.
> > > > >
> > > > > > > If any comment please let me know.
> > > > > > >
> > > > > > > > > Signed-off-by: Tokunori Ikegami
> <ikegami@xxxxxxxxxxxxxxxxxxxx>
> > > > > > > > > Signed-off-by: Hauke Mehrtens <hauke@xxxxxxxxxx>
> > > > > > > > > Signed-off-by: Koen Vandeputte
> <koen.vandeputte@xxxxxxxxxxxx>
> > > > > > > > > Signed-off-by: Fabio Bettoni <fbettoni@xxxxxxxxx>
> > > > > > > >
> > > > > > > > Has the patch really gone through all those people? SoB is
> used
> > > when
> > > > > > you
> > > > > > > > apply a patch in your tree or when you're the original author.
> > > > > > >
> > > > > > > I have just checked the OpenWRT git log again and it looks that
> > > it was
> > > > > > originally
> > > > > > > implemented by Felix Fietkau <nbd@xxxxxxxxxxx> by the patch
> below
> > > so
> > > > > I
> > > > > > will update the Signed-off-by tag as so.
> > >
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.
> > >
> openwrt.org%2F%3Fp%3Dopenwrt%2Fopenwrt.git%3Ba%3Dcommitdiff%3Bh%3D2530
> > >
> 640&amp;data=02%7C01%7Cjoakim.tjernlund%40infinera.com%7C916af968b27a4
> > >
> 02da8cf08d680812fce%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C63683
> > >
> 7689680126557&amp;sdata=w13ZTKwD1NiUQzxQfUou92KVDlW80qGUiZVIcjU%2BGPA%
> > > 3D&amp;reserved=0
> > > > > > f07cd2b3b14fe9ec03fa63a586452cc5f>
> > > > > > > > > Co-Developed-by: Hauke Mehrtens <hauke@xxxxxxxxxx>
> > > > > > > > > Co-Developed-by: Koen Vandeputte
> <koen.vandeputte@xxxxxxxxxxxx>
> > > > > > > > > Co-Developed-by: Fabio Bettoni <fbettoni@xxxxxxxxx>
> > > > > > > >
> > > > > > > > Not sure we want to add new undocumented tags, but you can
> mention
> > > > > > > > that all those people helped you find/debug the issue. They
> can
> > > also
> > > > > > > > add their Reviewed-by/Tested-by if they like.
> > > > > >
> > > > > > My bad, I just noticed these are valid flags [2], so you can keep
> > > them,
> > > > > > and according to the doc, you should also keep the SoB.
> > > > >
> > > > > I see.
> > > > > Yes I had also checked it.
> > > > >
> > > > > By the way in near future my company email address will be not able
> > > to
> > > > use.
> > > > > So I will change the mail address to my personal email address [4]
> after
> > > > > that or before.
> > > > >
> > > > > Regards,
> > > > > Ikegami
> > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Boris
> > > > > >
> > > > > >
> >

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/