Re: Installation image layout

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Fri, 12 Oct 2018 15:44:38 -0600

On Fri, Oct 12, 2018 at 4:30 AM, Marek Marczykowski-Górecki
<marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Oct 11, 2018 at 09:24:08PM -0600, Chris Murphy wrote:

>> I'm pretty sure the original reason was the default live install use
>> dd to block copy the root file system into the fedora-root LV, and
>> then resized the LV and ext4 file system.
>
> How is it done now?

On Live media installs, anaconda does:

rsync -pogAXtlHrDx --exclude /dev/ --exclude /proc/ --exclude /sys/
--exclude /run/ --exclude /boot/*rescue* --exclude /etc/machine-id
/mnt/install/source/ /mnt/sysimage

On DVD and netinstalls, I'm guessing based on packaging.log that it's
a dnf+rpm installation even though I never see a dnf or rpm process in
either top or ps. In any case, the rpm packages are directly on the
iso9660 file system, not baked into the

>> Why does efiboot.img have a 32MiB limit?
>
> Because "32MB should be enough for everybody"...
> Long story short, "El Torito" boot catalog structure have 16-bit field
> for image size (expressed in 512-bytes sectors). For details see here:
> https://wiki.osdev.org/El-Torito
> https://web.archive.org/web/20180112220141/https://download.intel.com/support/motherboards/desktop/sb/specscdrom.pdf
> (page 10)

OK. On Fedora 28 media, efiboot.img is ~9.2 MiB and does not contain
either the kernel or initramfs. The kernel and initramfs are found on
the iso9660 file system at images/pxeboot/ and also at isolinux/ where
GRUB UEFI uses the former, and isolinux BIOS uses the latter. Both
initrd's are 65M so they're already too big to go into bootefi.img -
and they kinda need to be because this particular initramfs is built
by dracut with --nohostonly flag so that hopefully we can boot
anything. (Curiously, the initramfs is 65M on DVD/netinstall and 50M
on LiveOS - I don't have an explanation for that. I'm looking at
Fedora 28 release images.)

>From my understanding, efiboot.img only would need to contain shim,
grubia32, grubx64 and supporting bootloader only files.

BTW, trivia: Fedora's installer creates EFI System partitions that are
always FAT16. So far as I know, no computer has complained, only
humans. FAT12/16 is OK for removable media but the spec pretty clearly
expects FAT32 for ESPs on permanent installs. The installer team
doesn't want to use mkfs flags, they expect the defaults to work
unless they don't work, and they do work, so FAT16 it is.

> Full story:
> https://github.com/QubesOS/qubes-issues/issues/794#issuecomment-135988806
>
> I've spent a lot of time debugging this, because mkisofs doesn't
> complain about it, just silently overflow higher bits to adjacent field,
> which results in weird results, depending on where you boot it. Adding
> isohybrid to the picture doesn't make it easier (there, higher bits are
> truncated, or actually not copied to the MBR partition table, as wasn't
> part of the original field).

I think we're stuck with isohybrid for a while. Having UEFI and BIOS
bootloaders, along with isohybrid supporting both as well as Macs, all
on one media image, that can be burned to optical media and written to
a USB stick - is hugely beneficial.

The compose process takes about 12 hours. That every ISO for all the
editions, and the spins, and the VM images, for all archs. Even having
separate UEFI and BIOS images, or splitting out Macs with their own
image, it'll increase compose times and complexity across the board.
I'm not sure which happens first: the end to optical media booting
support; or dropping support for BIOS and/or old Apple EFI Macs (only
this year did they start using UEFI, rather than their own variant of
Intel EFI pre-UEFI, so it'll take some time to see how that shakes out
which also involves whether and how Secure Boot can ever be supported
on Macs).

This talks a bit about isohybrid and all the very clever hacks
involved to make Fedora boot practically anything with a single ISO
9660 image. (I'm being x86_64 arch specific when I say that.)

https://mjg59.dreamwidth.org/11285.html

>>
>> I did give all of these things some thought a long time ago when I ran
>> into a lorax hack by Will Woods who used Btrfs as the root.img file
>> system, I'm not sure why it was used. But it gave me the idea of using
>> a few features built into Btrfs specifically for this use case:
>>
>> - seed/sprout feature can be used with zram block device for volatile
>> overlay; and used with a blank partition on the stick for persistent
>> overlay. Discovery is part of the btrfs kernel code.
>>
>> - Since metadata and data is always checksummed on every read, we
>> wouldn't have to depend on the slow and transient ISO checksum
>> (rd.live.check which uses checkisomd5) which likewise breaks when
>> creating a stick with livecd-iso-to-disk.
>>
>> - Btrfs supports zstd compression. I did some testing and squashfs is
>> still a bit more efficient because it compresses fs metadata, whereas
>> Btrfs only compresses data extents.
>>
>> The gotcha here is the resulting image isn't going to be bit for bit
>> reproducible: UUIDs and time stamps are strewn throughout the file
>> system (similar to ext4 and XFS), but any sufficiently complex file
>> system is going to have this problem.
>
> I wouldn't worry about _files_ timestamps that much - in most cases this is
> solvable problem by elaborate enough find+touch[4]. But that's not all
> obviously, there are various timestamps in superblock, and other
> metadata. The most problematic part in "normal" filesystems, using
> kernel driver is inode allocation, block allocation etc. This greatly
> depends on timing, ordering, specific kernel version etc.
> See [5] for details.

mkfs.btrfs has --rootdir and --shrink features to pre-allocate a
volume with files at mkfs time; I have no idea to what degree it
depends on kernel code. The main benefit with this is it's really easy
to implement full checksum matching for metadata and data on every
read, and user space ends up with EIO instead of corrupt data, and
super clear kernel complaints. And such corruption whether on optical
or USB sticks, is common. Even the more rare case of a stick that
passes md5 checksum, can later have transient and silent corruption
that ends up showing up in weird ways.

It's plausible squashfs could implement this, I think by default it
already checksums every file to look for duplicates, but it doesn't
retain the per file hash for integrity checking later on. It's also
possible with dm-verity or dm-integrity but then that adds back the dm
complexity.

-- 
Chris Murphy
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx