On Fr, 20.04.18 12:20, Chris Murphy (lists@xxxxxxxxxxxxxxxxx) wrote: > I'm honestly mystified why the plymouth commit hasn't been reverted in > the interim. But I'm also mystified why the bootloader folks don't > give a shit to commit their configuration files to disk when they know > they can't do journal replay and have known that for 20 years. But > then I'm also mystified why systemd developers won't fallback to > freeze/thaw if rootfs remount-ro fails three times, instead of just > giving up and forcing a reboot, they did discuss doing this a year ago > and then poof, no action. Quite frankly, if you want to put the blame somewhere, I'd probably place it with the xfs folks? I mean, there's a well-defined API on linux for syncing a file system to disk so that it is in a clean state, it's called sync(). Turns out that doesn't work though, it doesn't actually do that. systemd calls that API during shutdown if it is unable to unmount some file system because the kernel refuses it. sysvinit did it like that. Upstart did it that way. *Everybody* else does it that way, too. There's FIFREEZE/FITHAW in xfs. It has a very different purpose from syncing. It does considerably more: it also stops all further I/O to the file system. That is seriously awful, if the process invoking it actually runs from that very file system. But sure, we can pretend this wasn't an issue and call mlockall() first (which we do), and immediately FITHAW after FIFREEZE. Besides obviously being brittle and ugly, will that be sufficient? I am not sure. Let's not forget the actual trigger of the issue: plymouth is running but its binary was updated while it alreadywas. The old file is now deleted but remains pinned as long as plymouth is running. Basically as long as plymouth is running the file system will have operations pending, hence it is likely to get dirtied pretty soon again after that FIFREEZE/FITHAW dance... But then there's the other thing: in order to call FIFREEZE/FITHAW we need to open() an fd on it. Which means actually doing disk accesses (including possibly enqueing write accesses due to atime) actually, and that's something we currently try hard to avoid, because doing that on file systems that aren't healthy anymore means deadlocks. In particular for network backed file systems this actually matters: for them we want to issue an umount() or remount syscall, and only that, we never want to actually access the file system, because the network is very likely already down or otherwise unavailable. Now you might say "but xfs is not a network file system!". That's not true unfortunately, iscsi, nbd and all that other stacked jumble means everything is a network file system these days. The sync() syscall doesn't suffer by this issue. Yes, it will of course trigger disk I/O too, that's it's whole point after all. However, it will only do so on file systems that known dirty, and won't generate new I/O on its own. There was a thread about this a while back on systemd-devel: https://lists.freedesktop.org/archives/systemd-devel/2017-April/038615.html (and around there). Back than the open() issue wasn't clear to me, this only came up recently when we worked on some NFS-related umount work (specifically: current systemd will now implement in userspace a time-out around mount() and sync() because of the general flakiness of that interface). Hence I am pretty sure FIFREEZE is really not useful. It has a different purpose, and by using it we might make things nicer for some but much worse for many others. In that thread I indicated I'd merge a patch that adds it. At this point I am convinced that that would just be a game of whack-a-mole, and we'd just make things worse on networking fs... Hence, I am pretty sure that xfs should fix their implementation of sync(). I'd also be fine with calling some other API for this if they really don't want to fix sync() — as long as it is generic, and not some xfs specific hack. The key really here is that the API should actually do what is needed here, i.e. no pausing of IO or so. And it should not require us to open an fd on the file system in question. That all said: I figure plymouth should be changed to start in the initrd and then stick around for good, and never be updated/replaced by any binary from the host system. That way it can't and won't keep an files pinned from the host fs, if it's updated, as it will be purely backed by the initrd file system. We generally require this from storage tools, and plymouth should do the same. Lennart _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx