On Mon, Jun 25, 2018 at 4:40 AM, Lennart Poettering <mzerqung@xxxxxxxxxxx> wrote: > I am not sure why you are making this about systemd-boot. Let's just > focus on why (or why not) vfat is the best option for $BOOT. I'm making it about systemd-boot because it is literally the only bootloader that can't read any file system that the firmware can't already read; it's a UEFI only bootloader and it definitely sounds like the spec is written with the bootloader in mind rather than the other way around. VFAT: No links, labels, or xattr. Like I said earlier, UDF does support them, and has almost as much crossplatform kernel and bootloader support as FAT, and is as simple a volume format as FAT. If there is a way to get atomic updates on vfat without symlink, maybe vfat could be used for $BOOT. See below. >> > Which file system do you have in mind even for this? >> >> Unspecified for now. i.e. no change. It would remain ext4 by default I >> expect, but ultimately whatever anaconda allows. > > So think about this one bit ahead. Right now it's clear that even with > Grub's relatively large contributor base it'shard to impossible to > support modern Linux file systems properly — even just for > read-only. See the the XFS debacle as one example, and that the kernel > folks made clear they only consider their own in-kernel implementation > to be supportable. Now, I'd assume that sooner or later features such > as boot counting are something we want for Fedora too (i.e. that we > can update the kernel, try to boot the new one a couple of times, and > when it fails all the time revert back to an older version, fully > automatically; in fact the fedora desktop have very recently started > work on that, though they have a weaker model of simply showing the > boot menu after failed boots, instead of reverting back). Now, in that > model you need to count the attempted boots somewhere. Thus you need > write access somewhere (and no, EFI variables don't work for this, > they are not suitable for stuff changed on every boot, as their memory > is generally not ready to be written too that often). Which hence > means you need write access to some file system in some form from the > boot loader. And how do you think that's going to work out if already > read access to modern file systems is, well, a desaster? I agree the counter use case has merit. Is it a bootloader specific feature, or is it supposed to work across bootloaders? GRUB folks absolutely proscribe any file system writes. There is one state file, the grubenv. This is created through the file system 'grub-editenv create' and is used by GRUB by reading the file system to determine the LBA's for that file. Any writes by GRUB to grubenv happen outside the file system driver, directly to an LBA, so they're atomic (or at least as atomic as a drive can support). This matters because creating a new file has no guarantee any of the multiple writes required when going through the file system will actually happen, which could leave the file system dirty and needing repair - in addition to failing the meet the requirement for your use case which is a reliable way of counting boots *in the face of unknown boot failure*. So are you proposing a BLS variant of the grubenv that all bootloaders can share? It does matter, because not all file systems support grubenv. And it also matters because this same state file could be used for atomically switching the default boot entry. e.g. rpm-ostree depends on a symlink to switch default boot, so perhaps this could be done by modifying a flag in this static file, outside of the file system. > Again, FAT is the one thing everyone can agree on. Boot loaders can > read it *and write it*, UEFI and raspberry pi firmwares have support > for it, and all OSes and their initrds generally too. Bootloader absolutely do not write to any file system including FAT. And GRUB's grubenv permits storing limited state information on file systems other than vfat, so vfat still isn't required. Let me tell you how totally non-trivial VFAT is for sharing when the driver is in firmware. Digital camera vendors have had vfat drivers in both consumer and professional cameras for over a decade. The one sure way you can corrupt your CF/SD card file system, is transferring it between cameras *even of the same make and model with different firmware versions* and doing basic file operations like create and delete. Boom! Fuck all your files! Hahaha! (Yes the camera maniacally laughs in your face as it corrupts the file system.) The manufacturer recommendation, even on professional gear? Format the card in-camera before each use. Shoot. Do not ever delete files. When you're ready suck the images off the card, back them up, put the card back in the camera, reformat. If you switch cards to different cameras, reformat the card. You can't do that? Expect data loss is possible. > From the Linux side we can provide relatively safe read and write > suppport for FAT. For example, if Fedora would use the systemd > automount logic for mounting $BOOT then the file system will generally > be unmounted, except in a small time window around actual > accesses. This means the chance that the file system remains in a > clean state is maxmized. > > $BOOT is a place to place very few files, with very simple access > patterns. Basically, during update cycles we just add a few files > there and remove some others, and they are written in one linear write > operation. For doing that we need no fancy file system features. The > simplest, most common file system storing files ist good enough for > that. I've lost count how many times I personally have experienced such data loss, with all sorts of consumer and professional gear, let alone the number of stories I've heard from professional photographers and from camera and SD/CF card engineers. There is no possible way you can convince me it's reliable for either firmware or bootloader to do even simple file operations on vfat. I've had way too much experience to the contrary. It's such a basic and expected thing, that it's a fundamental process taught to photographers: you can find it any technical book and class on the subject. Sure it's fucked up. It's also true. Writing some limited hints that can fit in a single 512 byte sector, is perhaps plausible, where we're writing that directly to a single LBA outside of the file system driver. >> This problem has many little saboteurs acting together to betray the >> user. It isn't really any one single thing, they all have to happen to >> capsize the ship. > > So what are you proposing? Are you going to work on the XFS driver in > grub to make it match the kernel's current version? And for ext4 too? > I mean, good luck with that... For that specific problem, I've already said plymouth is doing the wrong damned thing totally contrary to rather clear systemd documentation. And that's why systemd fails to remount-ro which is why the journal isn't flushed which is why the bootloader doesn't see the changes (or alternatively sometimes it sees part of the changes in the form of a zero length grub.cfg). *shrug* Look if people do whatever they want contrary to documentation, and then take years to still no fix their shit? That is not my idea. My proposal has been to revert the commit allowing plymouth to be kill exempt by systemd. But it's been a year, over a year maybe, since then and it's still not done? So now my proposal is something like this: systemd folks, too many people are pissing in the swimming pool, you can no longer provide any service a means of being kill exempt. Delay the kill if you want, but eventually it must be killed, so you can remount-ro, so that the journal is flushed, so that the bootloader doesn't fucking faceplant. Really though, the bootloaders each need to do this stupid FIFREEZE/FITHAW if they're going to support file systems that refuse to sync() correctly. And that's because ultimately it is not systemd's responsibility to remount-ro in order to make sure the bootloader's changes are properly committed to disk. The GRUB folks want to support XFS? Fine. That means either supporting FIFREEZE/FITHAW at grub.cfg create time, or they have to teach the bootloader how to read a journal. The former is a metric buttton easier than the latter, even if it's not an atomic operation. > >> > Why not just stick to VFAT? As mentioned, it's really the only thing >> > generally understood by everything that has a stake in boot >> > loading. Grub speaks it. The EFI firmware speaks it (and that also >> > means the EFI shell, which is immensly useful). Linux speaks it in the >> > initrd and after boot. Windows speaks it. MacOS speaks it. It's the >> > lowest common denominator and should be entirely sufficient to store a >> > few kernels and their initrds. I mean, we build our kernels as EFI >> > binaries on Fedora, IIRC. Wouldn't it be a pity if EFI can't actually >> > access them, because they are stored on an fs only Linux speaks? >> >> Wouldn't it be a pity if we didn't teach UEFI to read every goddamn >> file system ever invented just because we can?! >> >> http://efi.akeo.ie/downloads/efifs-latest/x64/ >> https://github.com/pbatard/efifs > > Oh, right. this approach already failed with Grub, with it's > relatively large commercial support, and now you want pile on? Oh please stop. It's not a failed approach. Windows and macOS have been doing this reliably for 20 years because, holy crap! they actually commit their bootloader changes before rebooting! Now if you want to point fingers at why the bootloader changes aren't committed properly before reboot, have at it, I'm totally on board. But you don't get to point to one tiny fail and say the whole concept is a failure, when there's billions of implementations that do the same damn thing correctly on NTFS and JHFS+ and neither of them are doing journal replays in their bootloader either. They expect things to be properly committed before reboot is even called. And they can do this because they aren't doing weird shit like persistently mounting their ESPs or boot volumes and expecting it's someone else's problem to cleanly unmount them. > >> I mean honestly, we can teach EFI whatever the hell we want. File >> system support does not need to be baked into the bootloader on UEFI. >> Drop these guys onto your ESP and now the firmware with any bootloader >> can read any of those file systems directly. Pick one. >> >> I have to defer to others on the value of symlinks for atomic >> configuration swapout, but if you want the most widely supported file >> system that also has symlink support, it's UDF. For the time being >> though, the concept of a widely sharable $BOOT really doesn't have >> enough momentum or backing. > > UDF? When's the last time you actually used that? I mean, I don't even > have a DVD drive anymore, where I could find an UDF file system on... I use it regularly on large flash media because there's no file size limit like vfat, it supports Unix permissions, symlinks, hardlinnks, and also isn't proprietary like vfat or exfat. Since ancient times it's been intended for random read/write support on flash media and even hard drives, it's not a DVD only thing. It has way better cross platform support than exFAT, NTFS, or HFS+ - and without the vfat limitations. > Also, it's read-only afair, hence stuff like boot counting is not > going to work, it's a dead end. It is not a read only affair. You're confusing UDF with ISO 9660. They aren't the same thing. UDF has offered random read write support for hard drives before flash drives were even a thing, since its inception. https://en.wikipedia.org/wiki/Universal_Disk_Format -- Chris Murphy _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx/message/2DRUKAMERLSLYNUTJPZ4WTL2BLOTTZUO/