On Wed, 2019-01-23 at 20:56 +0900, Tokunori Ikegami wrote: > > > Hi Jocke-san, > > Thanks for your advice. > > To make sure let me confirm below. > > The OpenWrt code includes your patch below. > > f69cd2d30a80 2018-05-01 12:58:18 -0700 mtd: cfi: cmdset_0002: Do not allow read/write to suspend erase block. [Joakim Tjernlund] > > Do you mean that it is possible to be needed an additional change more based on this? That patch resolves another problem. I have not sent a patch for problem I mentioned in this mail. > Or is it not related to the patch fixed by you? > Note: Sorry now I am not able to check the patches to try sent by you before. NP Jocke > > > I have lost track of all the details regarding this issue. I just want to > > add: > > > > There is a max number of suspend/resume cycles one can do during an > > erase(possibly also for write) > > and once that number is hit you get an error. One way to avoid this could > > be to > > wait after each resume until the erase has started again(look in status > > register) > > before continuing. > > > > If this has anything to do with this problem, I do not know. > > > > Jocke > > Regards, > Ikegami > > > -----Original Message----- > > From: Joakim Tjernlund [mailto:Joakim.Tjernlund@xxxxxxxxxxxx] > > Sent: Wednesday, January 23, 2019 1:58 AM > > To: boris.brezillon@xxxxxxxxxxx; ikegami_to@xxxxxxxxxxx > > Cc: linux-mtd@xxxxxxxxxxxxxxxxxxx; chris.packham@xxxxxxxxxxxxxxxxxxx; > > fbettoni@xxxxxxxxx; nbd@xxxxxxxx; stable@xxxxxxxxxxxxxxx; > > hauke@xxxxxxxxxx; koen.vandeputte@xxxxxxxxxxxx; > > boris.brezillon@xxxxxxxxxxxxxxxxxx > > Subject: Re: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change > > do_write_oneword() to use chip_good() > > > > On Wed, 2019-01-23 at 00:49 +0900, Tokunori Ikegami wrote: > > > > > > Hi Boris-san, > > > > > > Very sorry for too late to update about this. > > > But could you please let me consult below about this patch? > > > > > > I have tried to investigate the issue root cause and confirmed below but > > > still the root cause seems not clear. > > > > > > 1. Without the change the write oneword is retried and recovered by > > the > > > current existing chip_good() checking. > > > But after the 1,001 times recovery it was continued to fail recovery > > > with the 3 times retry. > > > > I have lost track of all the details regarding this issue. I just want to > > add: > > > > There is a max number of suspend/resume cycles one can do during an > > erase(possibly also for write) > > and once that number is hit you get an error. One way to avoid this could > > be to > > wait after each resume until the erase has started again(look in status > > register) > > before continuing. > > > > If this has anything to do with this problem, I do not know. > > > > Jocke > > > > > 2. By the patch change the recovery failure can be avoided and the write > > > oneword works correctly without any failure. > > > There are different from the original chip_good() checking as the > > > original code resets the chip before the retry. > > > The patch change wait the chip_good() status until the timeout expiry > > > without the chip reset. > > > Note: There is a different from the original OpenWrt patch and > > needed > > > to be changed as same and it will be done by the next v4 patch. > > > > > > 3. To narrow down the cause I have added some delays into the original > > > code to check the chip ready and good status. > > > But the failure behavior was not changed so it seems that the issue > > is > > > not depended to the timing. (But not sure) > > > > > > 4. On the OpenWrt the write buffer is disabled but to narrow down the > > > issue I have changed to enable the write buffer. > > > Then the flash error was not happened by the write buffer operation > > so > > > it seems that the flash driver works correctly. > > > But another issue was caused and it is similar issue with the original > > > OpenWrt behavior with the patch change. > > > Note: On the original OpenWrt needs to wait the file system > > > completion to build but it was not finished with the write buffer. (But > > not > > > sure about this behavior) > > > > > > Do you have any comment about this result? > > > > > > If you can agree about the patch change basically with the current > > situation > > > I will do send the v4 patch set later with fix for the comments. > > > > > > But it seems that it is difficult to investigate the root cause more at > > this > > > moment to me. > > > Since but the behavior looks depended on the flash chip hardware behavior > > > and I cannot debug the hardware behavior more I think. > > > Note: Now I can reproduce the flash error issue behavior on the OpenWrt > > > unit. > > > > > > > > > It is depended on the actual flash chip behavior so the root > > cause > > > > > is > > > > > > unknown. > > > > > > > > > > Yes, and that's what I'd like you to figure out, or at least have > > a > > > > > good idea why this doesn't work on some chips but works on others. > > > > > > > > I see. > > > > But it is a little bit difficult situation since I do not have the failure > > > > environment locally at this moment. > > > > But if needed I may ask to get the help for this to Fabio-san. > > > > > > Regards, > > > Ikegami > > > > > > > -----Original Message----- > > > > From: IKEGAMI Tokunori [mailto:ikegami@xxxxxxxxxxxxxxxxxxxx] > > > > Sent: Tuesday, November 6, 2018 6:47 PM > > > > To: Boris Brezillon; ikegami_to@xxxxxxxxxxx > > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke Mehrtens; > > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris; > > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni > > > > Subject: RE: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change > > > > do_write_oneword() to use chip_good() > > > > > > > > Sorry let me resend the mail below by changing the email address of > > > > Felix-san. > > > > > > > > -----Original Message----- > > > > From: IKEGAMI Tokunori > > > > Sent: Tuesday, November 06, 2018 6:37 PM > > > > To: 'Boris Brezillon'; 'ikegami_to@xxxxxxxxxxx' > > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke Mehrtens; > > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris; > > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni > > > > Subject: RE: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change > > > > do_write_oneword() to use chip_good() > > > > > > > > Hi Boris-san, > > > > > > > > > -----Original Message----- > > > > > From: stable-owner@xxxxxxxxxxxxxxx > > > > > [mailto:stable-owner@xxxxxxxxxxxxxxx] On Behalf Of Boris Brezillon > > > > > Sent: Tuesday, November 06, 2018 5:34 PM > > > > > To: IKEGAMI Tokunori > > > > > Cc: boris.brezillon@xxxxxxxxxxxxxxxxxx; Felix Fietkau; Hauke > > Mehrtens; > > > > > stable@xxxxxxxxxxxxxxx; Joakim Tjernlund; PACKHAM Chris; > > > > > linux-mtd@xxxxxxxxxxxxxxxxxxx; Koen Vandeputte; Fabio Bettoni > > > > > Subject: Re: [PATCH v3 01/11] mtd: cfi_cmdset_0002: Change > > > > > do_write_oneword() to use chip_good() > > > > > > > > > > Hi IKEGAMI, > > > > > > > > > > On Tue, 6 Nov 2018 00:25:43 +0000 > > > > > IKEGAMI Tokunori <ikegami@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > Also the issue can be fixed by using chip_good() instead of > > > > > chip_ready(). > > > > > > > > The chip_ready() just checks the value from flash memory twice. > > > > > > > > And the chip_good() checks the value with the expected value. > > > > > > > > Probably the issue can be fixed as checked correctly by the > > > > chip_good(). > > > > > > > > So change to use chip_good() instead of chip_ready(). > > > > > > > > > > > > > > Well, that's not really explaining why you think chip_good() should > > > > > be > > > > > > > used instead of chip_ready(). So I went on and looked at the > > > > > > > chip_good(), chip_ready() and do_write_oneword() implementation, > > and > > > > > > > also looked at users of do_write_oneword(). It seems this function > > > > is > > > > > > > used to write data to the flash, and apparently the "one bit should > > > > > > > toggle to reflect a busy state" does not apply when writing things > > > > to > > > > > > > the memory array (it's probably working for other CFI commands, > > but > > > > > I > > > > > > > guess it takes more time to actually change the level of a NOR > > cell, > > > > > > > hence the result of 2 identical reads does not mean that the write > > > > is > > > > > > > done). > > > > > > > > > > > > > > Also, it seems that cmdset_0001 is not implementing chip_ready() > > the > > > > > > > same way, and I wonder if cmdset_0002 implementation is correct > > to > > > > > > > start with. Or maybe I don't get what chip_ready() is for. > > > > > > > > > > > > > > Anyway, this is the sort of clarification I'd like to have. > > > > > > > > > > > > I am thinking to update the commit message as below. > > > > > > > > > > > > mtd: cfi_cmdset_0002: Use chip_good() to retry in > > > > do_write_oneword() > > > > > > As reported by the OpenWRT team, write requests sometimes fail > > on > > > > > some > > > > > > platforms. > > > > > > Currently to check the state chip_ready() is used correctly > > as > > > > > described by > > > > > > the flash memory S29GL256P11TFI01 datasheet. > > > > > > > > > > I had a look at the S29GL256P datasheet here [1], and if I'm correct, > > > > > it's using cmdset 0001. > > > > > > > > No actually the cmdset 0002 is used on the flash chip. > > > > The manufacturer ID xx01h and Device ID 2201h are used to decide. > > > > > > > > There is information from Fobis-san below also about this. > > > > > > > > On forum thread musashino posted picture of flash chip: > > > > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fforum > > .openwrt.org%2Ft%2Fimpossible-to-install-update-any-packages-&data > > =02%7C01%7Cjoakim.tjernlund%40infinera.com%7C916af968b27a402da8cf08d68 > > 0812fce%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C63683768968012655 > > 7&sdata=NNGSYgq1VTuofPPMMlyKIm9W1DJHQFw0s94Ernq5cts%3D&reserve > > d=0 > > > > on-wzr-hp-g300nh-18-06-1 > > > > > > https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.cy > > press.com%2Fpart%2Fs29gl256p11tfi010&data=02%7C01%7Cjoakim.tjernlu > > nd%40infinera.com%7C916af968b27a402da8cf08d680812fce%7C285643de5f5b4b0 > > 3a1530ae2dc8aaf77%7C1%7C1%7C636837689680126557&sdata=Twk1VUEESz14U > > pdJjU4ohuhiQ5jN1uHLh0cAhlAznW0%3D&reserved=0 > > > > [ 0.862264] physmap platform flash device: 02000000 at 1e000000 > > > > [ 0.868331] physmap-flash: Found 1 x16 devices at 0x0 in 16-bit > > > > bank. Manufacturer ID 0x000001 Chip ID 0x002201 > > > > [ 0.878493] Amd/Fujitsu Extended Query Table at 0x0040 > > > > [ 0.883668] Amd/Fujitsu Extended Query version 1.3. > > > > [ 0.888768] number of CFI chips: 1 > > > > [ 0.894557] Searching for RedBoot partition table in physmap-flash > > > > at offset 0x1fc0000 > > > > [ 0.918009] Searching for RedBoot partition table in physmap-flash > > > > at offset 0x1fe0000 > > > > [ 0.941464] No RedBoot partition table detected in physmap-flash > > > > [ 0.947926] Creating 5 MTD partitions on "physmap-flash": > > > > [ 0.953384] 0x000000000000-0x000000040000 : "u-boot" > > > > [ 0.960853] 0x000000040000-0x000000060000 : "u-boot-env" > > > > [ 0.968803] 0x000000060000-0x000001fc0000 : "firmware" > > > > [ 0.981859] 2 uimage-fw partitions found on MTD device firmware > > > > [ 0.987900] 0x000000060000-0x0000001b5706 : "kernel" > > > > [ 0.994916] 0x0000001b5706-0x000001fc0000 : "rootfs" > > > > [ 1.001986] mtd: device 4 (rootfs) set to be root filesystem > > > > [ 1.007789] 1 squashfs-split partitions found on MTD device rootfs > > > > [ 1.014014] 0x0000003c0000-0x000001fc0000 : "rootfs_data" > > > > [ 1.022093] 0x000001fc0000-0x000001fe0000 : "user_property" > > > > [ 1.030283] 0x000001fe0000-0x000002000000 : "art" > > > > > > > > Maybe you could post links to forum thread, and data sheet. > > > > > > > > > > Also chip_good() is used to check if the write is succeeded > > and > > > > it > > > > > was > > > > > > implemented by the commit fb4a90bfcd6d8 ("[MTD] CFI-0002 - > > Improve > > > > > error > > > > > > checking"). > > > > > > But actually the write failure is caused on some platforms and > > > also > > > > > it can > > > > > > be fixed by using chip_good() to check the state and retry > > > instead. > > > > > Do you know on which NOR chips this happens? Do you have access to > > the > > > > > datasheet? > > > > > > > > But it looks SST49LF008A [3] from the changes below but I am not sure > > at > > > > this moment and probably it should be confirmed to the authr Eric W. > > > > Biedermann <ebiederman@xxxxxxxx> to make sure. > > > > > > > > +#define SST49LF008A 0x005a > > > > > > > > static int cfi_amdstd_read (struct mtd_info *, loff_t, size_t, size_t > > *, > > > > u_char *); > > > > static int cfi_amdstd_write_words(struct mtd_info *, loff_t, size_t, > > > > size_t *, const u_char *); > > > > @@ -191,6 +192,7 @@ static struct cfi_fixup cfi_fixup_table[] = { > > > > }; > > > > static struct cfi_fixup jedec_fixup_table[] = { > > > > { MANUFACTURER_SST, SST49LF004B, fixup_use_fwh_lock, NULL, }, > > > > + { MANUFACTURER_SST, SST49LF008A, fixup_use_fwh_lock, NULL, }, > > > > > > > > > > It is depended on the actual flash chip behavior so the root > > cause > > > > > is > > > > > > unknown. > > > > > > > > > > Yes, and that's what I'd like you to figure out, or at least have > > a > > > > > good idea why this doesn't work on some chips but works on others. > > > > > > > > I see. > > > > But it is a little bit difficult situation since I do not have the failure > > > > environment locally at this moment. > > > > But if needed I may ask to get the help for this to Fabio-san. > > > > > > > > > > If any comment please let me know. > > > > > > > > > > > > > > Signed-off-by: Tokunori Ikegami <ikegami@xxxxxxxxxxxxxxxxxxxx> > > > > > > > > Signed-off-by: Hauke Mehrtens <hauke@xxxxxxxxxx> > > > > > > > > Signed-off-by: Koen Vandeputte <koen.vandeputte@xxxxxxxxxxxx> > > > > > > > > Signed-off-by: Fabio Bettoni <fbettoni@xxxxxxxxx> > > > > > > > > > > > > > > Has the patch really gone through all those people? SoB is used > > when > > > > > you > > > > > > > apply a patch in your tree or when you're the original author. > > > > > > > > > > > > I have just checked the OpenWRT git log again and it looks that > > it was > > > > > originally > > > > > > implemented by Felix Fietkau <nbd@xxxxxxxxxxx> by the patch below > > so > > > > I > > > > > will update the Signed-off-by tag as so. > > <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit. > > openwrt.org%2F%3Fp%3Dopenwrt%2Fopenwrt.git%3Ba%3Dcommitdiff%3Bh%3D2530 > > 640&data=02%7C01%7Cjoakim.tjernlund%40infinera.com%7C916af968b27a4 > > 02da8cf08d680812fce%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C1%7C63683 > > 7689680126557&sdata=w13ZTKwD1NiUQzxQfUou92KVDlW80qGUiZVIcjU%2BGPA% > > 3D&reserved=0 > > > > > f07cd2b3b14fe9ec03fa63a586452cc5f> > > > > > > > > Co-Developed-by: Hauke Mehrtens <hauke@xxxxxxxxxx> > > > > > > > > Co-Developed-by: Koen Vandeputte <koen.vandeputte@xxxxxxxxxxxx> > > > > > > > > Co-Developed-by: Fabio Bettoni <fbettoni@xxxxxxxxx> > > > > > > > > > > > > > > Not sure we want to add new undocumented tags, but you can mention > > > > > > > that all those people helped you find/debug the issue. They can > > also > > > > > > > add their Reviewed-by/Tested-by if they like. > > > > > > > > > > My bad, I just noticed these are valid flags [2], so you can keep > > them, > > > > > and according to the doc, you should also keep the SoB. > > > > > > > > I see. > > > > Yes I had also checked it. > > > > > > > > By the way in near future my company email address will be not able > > to > > > use. > > > > So I will change the mail address to my personal email address [4] after > > > > that or before. > > > > > > > > Regards, > > > > Ikegami > > > > > > > > > Regards, > > > > > > > > > > Boris > > > > > > > > > > > ______________________________________________________ Linux MTD discussion mailing list http://lists.infradead.org/mailman/listinfo/linux-mtd/