Re: [PATCH 2/2] ARM: i.MX: xload: consider ECC strength when reading page

Andrej Picej <andrej.picej@xxxxxxxxx> · Tue, 8 Jun 2021 09:23:39 +0200

Hi Trent,

firstly thanks for your input. Please find my comments bellow.

On 7. 06. 21 22:03, Trent Piepho wrote:
On Mon, Jun 7, 2021 at 2:32 AM Andrej Picej <andrej.picej@xxxxxxxxx> wrote:
Some NAND update tools/flashers do not take the full advantage of NAND's
entire page area for ECC purposes. For example, they might only use 2112
bytes of available 2176 bytes. In this case, ECC parameters have to be
read from the FCB table and taken into account in GPMI NAND xloader to
properly calculate page data length so DMA chain can be executed
correctly.

Tested on PHYTEC phyCARD i.MX6Q board with following NANDs:
- Samsung K9K8G08U0E (pagesize: 0x800, oobsize: 0x40)
- Winbond W29N08GVSIAA (pagesize: 0x800, oobsize: 0x40) and
- Spansion S34ML08G201FI00 (pagesize: 0x800, oobsize: 0x80).

All NANDs having set ECC strength to 4 (13 bytes) despite Spansion NAND
chip supporting ECC strength of 9 (29 bytes).

There is a bug in NXP's latest imx kernel, lf-5.10.y-1.0.0, that
results in the kernel driver incorrectly using the minimum ECC
specified in the ONFI nand specs instead of calculating a maximal ecc
value and using that, which is what prior kernels and the upstream
kernel use.  It was caused by incorrectly resolving a conflict when
they rebased one of their old patches to 5.10.

The common pagesize 0x800, oobsize 0x40 should use 8-bit ECC.  That's
what the uboot, barebox, and linux drivers would do since the first
mxs nand support years ago.  It's only the recent kernel bug in nxp's
kernel that will choose 4.

OK, I wasn't aware of this kernel bug, but this is not what we are 
trying to fix here. Our use-case for this, is migration from eboot (some 
old WinCE version) to barebox with some proprietary flasher tool. This 
tool uses NAND settings used by eboot, which are hardcoded to fixed 
pagesize of 0x800 bytes and oobsize of 0x40 bytes (8 ECC bits). If for 
example some other NAND is used with different page size (e.g. pagesize 
of 0x800 bytes with oobsize of 0x80 bytes) the BCH ECC page organization 
will only use 0x840 bytes.

So rather than switch to 4-bit, it would be better to fix these boards
to use 8-bit like they should.  More reliable ECC, and it will work
correctly on barebox, u-boot, old imx kernels, current upstream
kernels, and hopefully future imx kernels.

I agree that it would be better to use all of the space available, but 
if flasher used wrong settings to copy barebox binary to NAND these 
settings (although not optimal) should be used to make booting even 
possible.

Using the FCB data here might not be such a good idea.  While it seems
like the right thing, there are some issues:
The barebox main gpmi nand driver doesn't use the FCB
U-boot doesn't use the FCB
No Linux kernel uses the FCB

The main reason why I think we should use FCB here for this is because 
i.MX6's ROM already uses these values for booting into pre-bootloader. 
That's why we try to act in xloader like ROM does (reading NAND 
parameters from FCB). Nevertheless flasher tools should be responsible 
to match the BCH ECC page with what it is written into FCB. If that is 
not the case then we can only presume that the flasher used the optimal 
size for ECC.

If you try to read/write nand from any of those places, it won't work.
The only way to make it work, is to have the FCB match what those
drivers do.

In our case the described proprietary flasher tool only flashes barebox 
so only NAND pages with barebox binary are using not optimal ECC 
settings. If for example kernel, devicetree and rootfs would be flashed 
from barebox the NAND pages there would use correct ECC size and booting 
into linux and updating those NAND pages from linux works. Updating 
barebox from barebox itself (using barebox_update) would mean that the 
barebox binary will be overwritten in NAND with optimal ECC settings and 
FCB will be updated accordingly.

I think it would have been better if the original design had been for
the bootloader to read the FCB, use that to load the kernel, and then
fixup the ECC config into the device tree for the kernel to use too.
One source, the FCB, which is propagated to all users.  Everyone will
agree on the ECC and there are no independent settings to keep in
sync.

But they didn't do that.  Each driver figures it out on it's own and
hopefully they use matching algorithms that arrive at the same answer.
But of course this fails, like with nxp's lf-5.10.y-1.0.0 kernel.
This isn't the first time, this same type of bug appeared back in 2013
in 2febcdf84b and was fixed in 031e2777e.

So while your commit will allow these boards using poorly chosen FCB
values to work with the xloader, they will be corrupted if nand is
written to from barebox non-xload or from linux.

We are only using this ECC values to read barebox binary from NAND and 
copy it to RAM. If other NAND pages will be using different ECC values 
that doesn't break anything, I think. Only problem that I can see here 
is barebox or linux reading NAND pages occupied by barebox binary, this 
will most likely fail, but I don't see why that would be necessary anyway.

I don't think we are braking anything here, we are just fixing booting 
barebox from NAND whit not optimal ECC settings.

Please correct me if I'm wrong or if I'm missing something here?

BR,

Andrej

_______________________________________________
barebox mailing list
barebox@xxxxxxxxxxxxxxxxxxx
http://lists.infradead.org/mailman/listinfo/barebox