Hi! On Wednesday 23 October 2019 02:10:50 Chris Murphy wrote: > a. write bootloader file to a temp location > b. fsync > c. mv temp final > d. fsync > > if the crash happens anywhere from before a. to just after c. the old > configuration file is still present and old kernel+initramfs are used. > No problem. If the crash happens well after c. probably the new one is > in place, for sure after d. it's in place, and the new kernel+ > initramfs are used. I do not think that kernel guarantee for any filesystem that rename operation would be atomic on underlying disk storage. But somebody else should confirm it. So if kernel crashes in the middle of c or between c and d you need to repair filesystem externally prior trying to boot from such disk. > > > But looking at vfat source code (file namei_vfat.c), both rename and > > lookup operation are locked by mutex, so during rename operation there > > should not be access to read directory and therefore race condition > > should not be there (which would cause reading inconsistent directory > > during rename operation). > > > > If you want atomic rename of two files independently of filesystem, you > > can use RENAME_EXCHANGE flag. It exchanges that two specified files > > atomically, so there would not be that race condition like in rename() > > that in some period of time both "foo" and "bar" would point to same > > inode. > > I'm not sure how to test the following: write kernel and initramfs to > final locations. And bootloader configuration is written to a temp > path. Then at the decision moment, rename it so that it goes from temp > path to final path doing at most 1 sector change. 1 512 byte sector > is a reasonable number to assume can be completely atomic for a > system. I have no idea if FAT can do such a 'mv' event with only one > sector change Theoretically it could be possible to implement it for FAT (with more restrictions), but I doubt that general purpose implementation of any filesystem in kernel can do such thing. So no practically. > > > > > > But... if you are asking for consistency and atomicity at filesystem > > level (e.g. you turn off disk / power supply during rename operation) > > then this is not atomic and probably it cannot be implemented. When FAT > > filesystem is mounted (either by Windows or Linux kernel) it is marked > > by "dirty" flag and later when doing unmount, "dirty" flag is cleared. > > Right. And at least on UEFI and arm boards, it's not the linux kernel > that needs to read it right after a crash. It's the firmware's FAT > driver. I have no idea how they react to the dirty flag. Those bootloader firmwares which just load & run bootloader practically do not write anything to that FAT filesystem. In most cases their implementation of FAT is read-only and very stupid. I doubt that there is check for dirty flag. I saw lot of commercial devices of different kind which can read & write (backup) data to (FAT) SD card. And lot of time they were not able to read FAT filesystem formatted by other tool, only by their (or by in-device FAT formatted). So such firmwares can be full of bugs and it really is not a good idea to try booting bootloader from inconsistent FAT filesystem. > Most distros > set /etc/fstab FS_PASSNO to 2, maybe it should be a 1, but in any case > if we boot something far enough along to get to user space fsck, the > dirty flag is cleaned up. fs_passno set to 2 should be fine. You need to set it to 1 only for root device, on which is running linux system. All other disks which are not needed for running linux system can have fs_passno set to 2. > > This is there to ensure that operations like rename were finished and > > were not stopped/killed in between. So future when you read from FAT > > filesystem you would know if it is in consistent state or not. > > GRUB has an option to blindly overwrite the 1024 byte contents of > grubenv (no file system modification), that's pretty close to atomic. > Most devices have physical sector bigger than 512 bytes. This write is > done in the pre-boot environment for saving state like boot counts. This depends on grub's FAT implementation. As said I would be very careful about such "atomic" writes. There are also some caches, include hardware on-disk, etc... > And add to the mix that I guess some UEFI firmware allow writing to > FAT in the pre-boot environment? Yes, UEFI API allows you to write to disk devices. And UEFI fileystem implementation can also supports writing to FAT fs. > I don't know if that's universally true. How do firmware handle a dirty bit being set? Bad implementation would ignore it. This is something which you should expect. > It's bad if the > firmware writes to such a file system anyway. But also bad if it can't > save state, now it's not possible to save boot attempts for fallback > purposes. The best is to always have fragile filesystem in consistent state. And if it goes broken, repair it on external system prior trying to write to it by some untrusted/broken/bad filesystem driver. This would prevent data damage. -- Pali Rohár pali.rohar@xxxxxxxxx