Hi Adam, Adam Ford <aford173@xxxxxxxxx> wrote on Tue, 12 Jan 2021 11:20:24 -0600: > On Tue, Jan 12, 2021 at 10:01 AM Adam Ford <aford173@xxxxxxxxx> wrote: > > > > On Tue, Jan 12, 2021 at 8:35 AM Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > > > > > Hi Adam, > > > > > > Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote on Mon, 11 Jan 2021 > > > 11:20:27 +0100: > > > > > > > Hi Adam, > > > > > > > > Adam Ford <aford173@xxxxxxxxx> wrote on Sat, 9 Jan 2021 08:46:44 -0600: > > > > > > > > > On Tue, Sep 29, 2020 at 6:09 PM Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > > > > > > > > > > > The NAND BCH control structure has nothing to do outside of this > > > > > > driver, all users of the nand_bch_init/free() functions just save it > > > > > > to chip->ecc.priv so do it in this driver directly and return a > > > > > > regular error code instead. > > > > > > > > > > > > Signed-off-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx> > > > > > > --- > > > > > > > > > > Starting with this commit: 3c0fe36abebe, the kernel either doesn't > > > > > build or returns errors on some omap2plus devices with the following > > > > > error: > > > > > > > > > > nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc > > > > > nand: Micron MT29F4G16ABBDA3W > > > > > nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB size: 64 > > > > > nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW > > > > > Invalid ECC layout > > > > > omap2-nand 30000000.nand: unable to use BCH library > > > > > omap2-nand: probe of 30000000.nand failed with error -22 > > > > > 8<--- cut here --- > > > > > > > > > > There are few commits using git bisect that have build errors, so it > > > > > wasn't possible for me to determine the exact commit that broke the > > > > > ECC. If the build failed, I marked it as 'bad' to git bisect. > > > > > > > > I am sorry to hear that, I regularly rebase with a make run between each > > > > pick and push my branches to a 0-day repository to have robots check > > > > for such errors, but sometimes I fail. > > > > > > > > > Newer commits have remedied the build issue, but the Invalid ECC > > > > > layout error still exists as of 5.11-RC2. > > > > > > > > Ok so let's focus on these. > > > > > > > > > Do you have any suggestions on what I can do to remedy this? I am > > > > > willing to try and test. > > > > > > > > Glad to hear that. > > > > > > > > Can you share the NAND controller DT node you are using? > > > > > > > > Also, can you please add a few printk's like below and give me the > > > > output? > > > > > > Will you have the time to check these soon? I am ready to help and > > > would like to fix it asap. > > > > Sorry for the delay, I have to split my time with 3 different > > projects. I am hoping to get you data later today. > > > Miquel, > > Here is the dump from my boot sequence: > > [ 2.629089] omap2-nand 30000000.nand: GPIO lookup for consumer rb > [ 2.635253] omap2-nand 30000000.nand: using device tree for GPIO lookup > [ 2.642150] of_get_named_gpiod_flags: parsed 'rb-gpios' property of node '/o) > [ 2.653900] gpio gpiochip6: Persistence not supported for GPIO 0 > [ 2.660339] nand: device found, Manufacturer ID: 0x2c, Chip ID: 0xbc > [ 2.666900] nand: Micron MT29F4G16ABBDA3W > [ 2.670959] nand: 512 MiB, SLC, erase size: 128 KiB, page size: 2048, OOB si4 > [ 2.678710] nand: using OMAP_ECC_BCH8_CODE_HW_DETECTION_SW > [ 2.684234] writesize 2048, step_size 512, nsteps 4 > [ 2.689300] strength 8, step size 512, code_size 13 Until here, everything looks fine. > [ 2.696807] count eccbytes 0 This is the cause of the error, the MTD OOB layout reports not ECC byte. Can you please check that we effectively call the large page helpers (in particular nand_ooblayout_ecc_lp()) . I bet this function returns -ERANGE on its first call, which reduces the eccbytes variable above to zero. What is strange is that, the only reason this would happen (to my eyes) is nand->ecc.ctx.total being 0. Can you please check its effective value? I do not see the immediate reason because nand->ecc.ctx.total is set to nsteps (4) * code_size (13) right before calling mtd_ooblayout_count_eccbytes(). Can you please verify my sayings and perhaps tackle the root cause of this issue? Please do not hesitate to ask questions, I'll do my best to help because this is a critical section that is not only breaking OMAP boards, unfortunately. Thanks, Miquèl