On Thu, Oct 11, 2018 at 6:37 PM, Marek Marczykowski-Górecki <marmarek@xxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi all! > > I'm new on this list. I work on Qubes OS, where Fedora is used as a base > distribution. > > While trying to build the installation image in reproducible manner[1], > I found the current installation image have unusual layout. Quoting > dracut.cmdline manual page: > > squashfs.img | Squashfs from LiveCD .iso downloaded via network > !(mount) > /LiveOS > |- rootfs.img | Filesystem image to mount read-only > !(mount) > /bin | Live filesystem > /boot | > /dev | > ... | > > This rootfs.img layer makes the image build very much unreproducible. > Why is it even there? Bare squashfs.img layer should be enough. Then, > mount overlayfs over it (I see there is even some partial support for it > in dmsquash-live). Most other Live systems I've seen use just squashfs + > overlayfs (or aufs if kernel is older), so it's commonly tested > configuration. I *guess* it's there for historical reason, from before > aufs/overlayfs being available. Is there any other reason for that? I'm pretty sure the original reason was the default live install use dd to block copy the root file system into the fedora-root LV, and then resized the LV and ext4 file system. There have also been a number of squashfs improvements since that decision so there might have been limitations with squashfs that ext4 didn't have (I'm thinking xattr were long supported in ext4 before squashfs, and maybe capabilities?) > > If there is no other reason, I propose to drop this and have > installer/live filesystem directly in squashfs.img. This have multiple > benefits: > - it's much easier to make the image build process reproducible (see > below) > - less complexity, both in the build and in the boot (the whole > dmsquash-live dracut module can be replaced with <20 line > function[2] > - smaller initramfs (which is extremely important if needed to be > included in efiboot.img, which can't be larger than 32MB) > - slightly faster boot time (device-mapper is slow) > > What do you think? Whatever we do should take into account the persistent root and persistent home use cases, specifically: https://github.com/livecd-tools/livecd-tools/blob/master/tools/livecd-iso-to-disk.sh --overlay-size-mb --home-size-mb A particular criticism of the device-mapper solution currently being used is in that script: it blows up. Literally it's WORM, and deleting files simply dereferences them, it doesn't free up pool space, so it is inevitable that the pool will fill up, and when it does it blows up the file system, and it can't be repaired. All you can do is reset the overlay which means deleting all changes and starting over. At least one of our spins, SOAS, depends on livecd-iso-to-disk for creating their final installation because it's predicated on running Fedora SOAS from a stick. Why does efiboot.img have a 32MiB limit? > As for the reproducibility, I've made changes to lorax (including > dropping rootfs.img layer), anaconda, pungi and createrepo and this all > allows to build bit-by-bit identical image, given the same input (rpm > packages, pungi configuration, $SOURCE_DATE_EPOCH variable[3]). Well, > almost - there is an issue with efiboot.img, but I already have a > solution, just not pushed it yet. > > You can find all the pull requests collected here: > https://github.com/QubesOS/qubes-installer-qubes-os/pull/26 > > I'll work further to make the changes merged upstream. > > [1] https://reproducible-builds.org/ > [2] https://github.com/QubesOS/qubes-installer-qubes-os/pull/26/commits/332be8e1e3e1006013772528078914f491d14c1f > [3] https://reproducible-builds.org/specs/source-date-epoch/ Cool! Well you've already done most of the work and if this has support elsewhere already then I'm in favor of continuing in that direction. I did give all of these things some thought a long time ago when I ran into a lorax hack by Will Woods who used Btrfs as the root.img file system, I'm not sure why it was used. But it gave me the idea of using a few features built into Btrfs specifically for this use case: - seed/sprout feature can be used with zram block device for volatile overlay; and used with a blank partition on the stick for persistent overlay. Discovery is part of the btrfs kernel code. - Since metadata and data is always checksummed on every read, we wouldn't have to depend on the slow and transient ISO checksum (rd.live.check which uses checkisomd5) which likewise breaks when creating a stick with livecd-iso-to-disk. - Btrfs supports zstd compression. I did some testing and squashfs is still a bit more efficient because it compresses fs metadata, whereas Btrfs only compresses data extents. The gotcha here is the resulting image isn't going to be bit for bit reproducible: UUIDs and time stamps are strewn throughout the file system (similar to ext4 and XFS), but any sufficiently complex file system is going to have this problem. Off hand I'm not sure how squashfs would get around it since it's going to draw from an ext4 source (not sure if the ephemeral root could be tmpfs and use it as the source for mksquashfs?) -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx