Re: Is rename(2) atomic on FAT?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 23, 2019 at 7:16 PM Pali Rohár <pali.rohar@xxxxxxxxx> wrote:
> On Wednesday 23 October 2019 16:21:19 Chris Murphy wrote:

> > I don't know either or how to confirm it.
>
> Somebody who is watching linuxfs-devel and has deep knowledge in this
> area... could provide more information.

Maybe dm-log-writes can do this? Just log all the writes, and
hopefully it's straightforward to match the 'mv' rename command with
the resulting writes.


> > Nice in theory, but in practice the user simply reboots, and screams
> > WTF outloud if the system face plants. And people wonder why things
> > are still broken 20 years later with all the same kinds of problems
> > and prescriptions to boot off some rescue media instead of it being
> > fail safe by design. It's definitely not fail safe to have a kernel
> > update that could possibly result in an unbootable system. I can't
> > think of any ordinary server, cloud, desktop, mobile user who wants to
> > have to boot from rescue media to do a simple repair. Of course they
> > all just want to reboot and have the right thing always happen no
> > matter what, otherwise they get so nervous about doing updates that
> > they postpone them longer than they should.
>
> Still, in any time when you improperly unmount filesystem you should
> check for error, if you do not want to loose your data.

Perhaps, but it's archaic. The user usually has no idea what went
wrong, and all kinds of factors strongly disincentivize doing an
offline fsck, and incentivize just rebooting and seeing what happens.
If they get past the bootloader, systemd/init is going to run an fsck
on all volumes that need it or kernel code does log replay to make
them up to date.

> And critical area should have some "recovery" mechanism to repair broken
> bootloader / kernel image.
>
> Anyway, chance that kernel crashes at step when replacing old kernel
> disk image by new one is low. So it should not be such big issue to need
> to do external recovery.

'strace -D -ff -o' on grub2-mkconfig causes over 1800 PID files to be
generated. Filtering for lines containing grub.cfg...

# grep grub.cfg *
grub.12167:execve("/usr/sbin/grub2-mkconfig", ["grub2-mkconfig", "-o",
"/boot/efi/EFI/fedora/grub.cfg"], 0x7ffc68054470 /* 24 vars */) = 0
grub.12167:read(3, "/boot/efi/EFI/fedora/grub.cfg\n", 128) = 30
grub.12167:openat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
grub.12167:read(255, "\nif test \"x${grub_cfg}\" != \"x\" ;"..., 8192) = 567
grub.12174:write(1, "/boot/efi/EFI/fedora/grub.cfg\n", 30) = 30
grub.12349:execve("/usr/bin/rm", ["rm", "-f",
"/boot/efi/EFI/fedora/grub.cfg.ne"...], 0x55c599fde980 /* 48 vars */)
= 0
grub.12349:newfstatat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new",
0x556be17d9758, AT_SYMLINK_NOFOLLOW) = -1 ENOENT (No such file or
directory)
grub.12349:unlinkat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new", 0)
= -1 ENOENT (No such file or directory)
grub.14064:execve("/usr/bin/grub2-script-check",
["/usr/bin/grub2-script-check",
"/boot/efi/EFI/fedora/grub.cfg.ne"...], 0x55c599fde980 /* 48 vars */)
= 0
grub.14064:openat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new", O_RDONLY) = 3
grub.14065:openat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg",
O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
grub.14065:execve("/usr/bin/cat", ["cat",
"/boot/efi/EFI/fedora/grub.cfg.ne"...], 0x55c599fde980 /* 48 vars */)
= 0
grub.14065:openat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new", O_RDONLY) = 3
grub.14066:execve("/usr/bin/rm", ["rm", "-f",
"/boot/efi/EFI/fedora/grub.cfg.ne"...], 0x55c599fde980 /* 48 vars */)
= 0
grub.14066:newfstatat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new",
{st_mode=S_IFREG|0700, st_size=6080, ...}, AT_SYMLINK_NOFOLLOW) = 0
grub.14066:unlinkat(AT_FDCWD, "/boot/efi/EFI/fedora/grub.cfg.new", 0) = 0

I'm not able to parse this. My best guess is it's writing out an all
new file, grub.cfg.new, and then doesn't rename it. Instead it uses
cat to copy the contents of the new file and overwrites the old one?
Yeah, the inode stays the same, as does access time. Is this fragile?

Android and ChromeOS and some others, have A and B kernel partitions
which are just blobs. They use some other form of hint to indicate
which partition is actually used at one time, meaning they can
reliably ensure a failsafe update of the other partition, and sanity
testing it, before committing the switch. Crude but effective.

Apple goes so far as to get all of their product firmware the ability
to natively read APFS, which contains the kernel and early boot files.

I have no idea how Windows does kernel or bootloader updates, except
they don't keep the EFI system partition persistently mounted all day
long, like virtually all Linux distributions today, at /boot/efi -
that does seem guaranteed to result in many dirty flag FAT file system
cleanups. I know I've seen such fix ups in my journal files.




> > > > I'm not sure how to test the following: write kernel and initramfs to
> > > > final locations. And bootloader configuration is written to a temp
> > > > path. Then at the decision moment, rename it so that it goes from temp
> > > > path to final path doing at most 1 sector change. 1 512 byte sector
> > > > is a reasonable number to assume can be completely atomic for a
> > > > system. I have no idea if FAT can do such a 'mv' event with only one
> > > > sector change
> > >
> > > Theoretically it could be possible to implement it for FAT (with more
> > > restrictions), but I doubt that general purpose implementation of any
> > > filesystem in kernel can do such thing. So no practically.
> >
> > Now I'm wondering what the UEFI spec says about this, and whether this
> > problem was anticipated, and how surprised I should be if it wasn't
> > anticipated.
>
> I know that UEFI spec has reference for FAT filesystems to MS
> specification (fagen103.doc). I do not know if it says anything about
> filesystem details, but I guess it specify requirements, that
> implementations must be compatible with FAT12, FAT16 and FAT32 according
> to specification.

My understanding of the UEFI spec is the file system is called the
'EFI file system' and was intended to be predicated on FAT12, FAT16,
FAT32 at a specific moment in time, bugs and warts and all. By now
probably around 20 years ago. And then not ever changed. In practice
it seems there is no such separate thing as the EFI file system. No
separate mkfs flag, or mount options, to make sure this is *the*
canonical EFI file system, rather than just today's latest bug fixed
and feature enhanced FAT file system as supported by Linux.

So god only knows what bugs might arise from that discrepancy one day.

> Also UEFI allows you to write our own UEFI filesystem drivers which
> other UEFI programs and bootloaders can use.

I'm not finding it this second but someone basically did this work
already, but wrapping existing GRUB file system modules into EFI file
system drivers.

OK so plausibly on UEFI, it could be handed a better FAT driver very
soon after POST to avoid firmware FAT bugs. Or for that matter, create
"A" and "B" EFI system partitions, containing identical static boot
data, that merely points to a purpose built $BOOT volume that can host
early boot files and supports atomic updates. That'd be clever, but
also not generic. It's UEFI specific.

It'd be neat to have a superset implementation that can work anywhere.
But then allow for optimizations. But the problem with the generic
solution? Who will follow it? The Bootloaderspec pretty much fell on
deaf ears. The GRUB folks don't care to upstream it, nor sysliux, nor
uboot near as I can tell. Simple 1 page spec. Fedora's GRUB carries
patches for it, and now uses them by default. Son hilariously Fedora
is maybe the first distribution to actively support three
substantially different bootloader update mechanisms: grub-mkconfig,
grubby, and bootloaderspec.

-- 
Chris Murphy




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux