Re: [PATCH 2/3] dt-bindings: mtd: Add Documentation for Airoha fixed-partitions

Andreas Gnau <andreas.gnau@xxxxxxxxx> · Wed, 16 Oct 2024 19:33:10 +0200

Hi Christian,

On 2024-10-16 09:33, Christian Marangi wrote:
On Wed, Oct 02, 2024 at 10:00:06AM +0200, Miquel Raynal wrote:
Hi Christian,

Ok probably the description isn't clear enough. The missing info that
require this parser is the flash end.

Following the example we know the size of rootfs_data and start offset
AND we know the size of the ART partition.

There might be a space in the middle unused between the rootfs_data
partition and the art partition. What is derived is the starting offset
of the art partition that is flash end - art partition size.
(where flash end change and is not always the same due to how the special
bad block managament table reserved space is handled)

This is why 0xffffffff, used as a dummy offset to signal it will be parsed at
runtime. On second tought tho maybe using this dummy offset is wrong and
I should just have something like

length = <0x300000>;

Is it clear now? Sorry for any confusion.

I'm sorry but not really. You know the end of the physical device and
the size of the ART partition, so you must know its start as well?

Before the system boot we know:
- size of the ART partition
- real size of the physical device (512mb... 1G... 64mb...)

When the physical device is probed (nand) a special driver is loaded
(before mtd parsing logic) that change the physical size of the device
(mtd->size) as at the end of the nand some space is reserved for bad
block management and other metadata info.

Here you are explaining what you intend Linux to do, right? I would
like to understand what you are trying to solve. I dont understand why
you need the size change, I don't understand why you don't know the
start of the ART partition, I don't understand what the data you are
hiding contains and who uses it :-) I'm sorry, this is too unclear yet.

Totally not a problem and thanks a lot for you keep asking them... More
than happy to clear things, I'm trying to solve a problem present on
Airoha SoC and upstreaming a correct parser for it.

What I'm trying to solve:

Correct access to this partition at the end of the flash in an automated
way.

The content of this partition is the usual ART partition found on lots of
embedded devices. MAC address, wifi calibration data, serial. Usage is
NVMEM cells and userspace with dd command to extract data from.

Airoha use something also used by some mediatek SoC. They call it BMT
and it's currently used downstream in OpenWrt and they firmware. This is
also used in the bootloader.

The usage of BMT is a custom way to handle bad blocks entirely by
software. At the end of the flash some space is reserved where info
about all the blocks of the flash are put. I'm not 100% sure about the
functionality of this but it can relocate block and do magic things to
handle bad blocks. For the scope of this change, the important info is
that after the BMT is probed, the operation of "reserving space" is done
by reducing the MTD flash size. So from the MTD subsystem, it does see a
smaller flash than it actually is.

The reserved space change! Across SoC or even devices but the BMT is a
must where it's used as bootloader makes use of it and writing to it
might confuse the bootloader corrupting data. (one block might be
flagged as bad ad data moved, BMT driver validates his table and do
operation)

Ok, I think that's way clearer now.

Hi sorry for the delay, very happy this is better now.

So the BMT driver does not exist in mainline Linux, but you would like
to skip this part of the MTD device to avoid smashing it. And it is in
use by the vendor Bootloader I guess?

Yes correct, idea is to permit easier access to the partition. I hope
(and assume) this driver will come upstream.

May I ask for a better understanding what the "complete goal" is? Is the 
goal only compatibility with the Airoha ATF as it is now? Or is the goal 
to read flashes that have been using the Airoha SDK with BL and Linux? 
Airoha bootloader is just software on the flash, it can be changed to 
not write BMT, which we have done.

I am asking, because I consider this BMT to be actually detrimental when 
used together with UBI. Wear-levelling on the UBI side is no longer 
correct when blocks can get re-located by some other entity below due to 
bad-blocks.

We have patched the Airoha bootloader components to not write BMT (and 
of course our U-Boot fork and Linux flash drivers do not use it either).

Before we had patched the bootloader, we had initially marked the BMT as 
bad block, but if I remember correctly the BMT location might also be 
somewhere else in case the BMT block itself is a bad block, which is 
also the reason why we went the safe way and patched ATF.

Just putting my thoughts/experiences out there, mostly because I am a 
bit concerned to about the possible double bad block management (UBI + 
Airoha). You might have more insight into things than we have about how 
exactly things work. So, maybe it is not as big of an issue.

I just think that this BMT can create a lot of other issues as well. And 
maybe, I am a bit of a burnt child dreading the fire because of learning 
about this BMT and how it works step by step the hard way. For example, 
we had an issue where Linux and U-Boot had support for a NAND chip while 
ATF would mis-detect that 256 MB NAND as 128 MB and thus destroy data by 
writing BMT in the middle of the flash. Fun times...

As a side-note: We have also migrated some customer devices that had 
been deployed in the field with Airoha SDK BMT and drivers to our UBI 
flash layout without BMT. Strategy was to load relevant data into RAM 
from Airoha U-Boot and then chainload our U-Boot 2023 that would flash 
everything back.

Is it some kind of table that is written by the chip itself in order to
maintain a list of auto-replacement blocks for bad blocks? Can the size
of this table move with the use of the device? (if yes, it's
problematic, we don't want to resize MTD partitions without noticing,
it would break eg. UBI).

No chip hw bad block is disabled with this implementation and the table
size doesn't move/change so MTD partitions will stay at the same offset
after the first parse on boot.

If the block that holds the BMT goes bad when BMT is being updated, 
wouldn't the BMT location itself change? At least that is my faint 
understanding (which could be wrong, of course and it has also been some 
time...).

I believe this BMT block is going against the bad block handling in
Linux, so I really wonder how one can use both mechanisms in a system.
If the BMT layer takes "one random block" to map a corrupted one on it,
it totally defeats the current bad block model we have in MTD/UBI
and simply cannot be supported at all. Just skipping the
currently-used-for-BMT blocks sounds like a very bad idea that will
break your system, later.

Well we disable it and since it's reserved, from the system side you can
do all kind of magic since the space used for the driver is not
available to the system but I will try to gather more info about this in
the next few days.

Best Regards,

Andreas Gnau