On Wed, Dec 14, 2022 at 08:43:29AM -0800, Christoph Hellwig wrote: > On Tue, Dec 13, 2022 at 12:45:27PM +0000, Daniel Golle wrote: > > > Yes, but a completely non-standard format that nests inside an > > > partition. > > > > The reason for this current discussion (see subject line) is exactly > > that you didn't like the newly introduced partition type GUID which > > then calls the newly introduced partition parser taking care of the > > uImage.FIT content of a partition. > > Which is the exact nesting I'm complaining about. Why do you need > to use your format inside a GPT partition table? The GPT partition table is typically written only once to an eMMC- based device in factory. Firmware images (typically uImage.FIT) are stored in partitions because there are sometimes two of them (for A/B dual-boot, or recovery/production dual-boot). As a working device firmware consists of kernel, dtb and rootfs, all these three things have to be written and used together, typically they also come together in one file for firmware upgrade (ie. rootfs appended to kernel, tarballs, or uImage.FIT containing all three of them). As the size of kernel and rootfs cannot be determined accurately at the time the device is made, having individual GPT partitions for kernel and rootfs ends up to either being a limit to future groth of the kernel image or wastes space by overestimating the kernel size. Changing the GPT partitioning when updating the device to match the exact sizes is also not an option as a damage to the GPT would then present a single point of failure (backup GPT also wouldn't help much here), so for dual-boot to actually be meaningful, we shouldn't ever write to any parts of the disk/flash which affect more than one of the dual-boot options. > What you're doing is bascially nesting a partition table format > inside another one, which doesn't make any sense at all. See the last paragraph of this message for good reasons why one would want to do exactly that. > > > This block driver (if built-into the kernel and relied upon to expose > > the block device used as root filesystem) will need to identify the > > lower device it should work on. And for that the helper functions such > > as devt_from_devname() need to be available for that driver. > > And devt_from_devname must not be used by more non-init code. It is > bad it got exposed at all, but new users are not acceptable. I assume that implementing anything similar using blk_lookup_devt in the driver itself is then also not acceptable, right? Yet another option would be to implement a way to acquire this information from device tree. Ie. have a reference to the disk device as well as an unsigned integer in the 'chosen' node which the bootloader can use to communicate this to the kernel. Example: chosen { bootdev = <&mmc0 3>; }; It's a bit more tricky for ubiblock or mtdblock devices because they don't have *any* representation in device tree at all at this point. In case of an MTD partition (for mtdblockX) we would just reference the mtd partition in the same way. To do this cleanly with NAND/UBI, I'd start with adding device-tree-based attaching of an MTD partition to UBI using a device-tree attribute 'compatible = "linux,ubi"' on the MTD partition. We could then have sub-nodes referencing specific UBI volumes, to select them for use with ubiblock but also for those nodes then being a valid reference for use with the to-be-introduced 'bootdev' attribute in 'chosen'. Does that sound acceptable from your perspective? > > > A block representation is the common denominator of all the > > above. Sure, I could implement splitting MTD devices according to > > uImage.FIT and then add MTD support to squashfs. Then implement > > splitting of UBI volumes and add UBI support to squashfs. > > Implementing MTD and/or UBI support would allow you to build a > kernel without CONFIG_BLOCK, which will save you a lot more than > the 64k you were whining about above. Even devices with NOR flash may still want support for removable block devices like USB pendrives or SD cards... Many home-routers got only 8MiB of NOR flash and yet come with USB 2.0 ports intended for a pendrive which is then shared via Samba. Also, heavily customzied per-device kernel builds would never scale up to support thousands of devices -- hence OpenWrt uses the exact same kernel build for many devices, which makes both, the build process and also debugging kernel bugs, much much easier (or even doable at all). > > > > None of this explains the silly nesting inside the GPT partition. > > > It is not needed for the any use cases and the root probem here. > > > > So where would you store the uImage (which will have to exist > > even to just load kernel and DTB in U-Boot, even without containing > > the root filesystem) on devices with eMMC then? > > Straight on the block device, where else? As the first few blocks are typically used for bootloader code and bootloader environment, we would then need to hard-code the offset(s) of the uImage.FIT on the block device. Imho this becomes messy and just using partitions seemed like a straight forward solution. And what about dual-boot systems where you have more than one firmware image? Hard-code more offsets? For each device? In a way, I was considering this by using blkdevparts= cmdline option instead of GPT, but > > > Are you suggesting to come up with an entirely new type of partition > > table only for that purpose? Which will require its own tools and > > implementation in both, U-Boot and Linux? What would be the benefit > > over just using GPT partitioning? > > Why do you need another layer of partitioning instead of storing > all your information either in the uImage, or in some other > partition format of your choice? The reason is the different life-cycle of the device main partition table, bootloader, bootloader environment, ... on one hand and each firmware image on a dual boot system on the other hand. Hence there is more than just one uImage: typically bootloader, bootloader environment, firmware A (uImage.FIT) and firmware B. Relace "A" and "B" with "recovery" and "production", depending on the dual-boot style implemented. Therefore re-writing the whole disk during firmware upgrades is not an option because it is risky, eg. in case of a sudden power failure we could end up with a hard-bricked system. So to me it makes sense that for a firmware upgrade, we write only to one partition and don't touch GPT or anything else on the device. So in case something goes wrong, the device will still boot, the bootloader will realize that the uImage.FIT in one partition is broken (uImage.FIT also comes with hashes to ensure image integrity) and it will load something else (from another partition) instead.