Hi Chris! The first question is what do you mean by "atomic". Either if is "atomic" at process level, that any process which access filesystem see consistent data at any time, or if by atomic you mean consistency of filesystem on underlying block device itself, or you mean atomicity at disk storage level. On Monday 21 October 2019 23:44:25 Richard Weinberger wrote: > Chris, > > [CC'ing fsdevel and Pali] > > On Mon, Oct 21, 2019 at 9:59 PM Chris Murphy <lists@xxxxxxxxxxxxxxxxx> wrote: > > > > http://man7.org/linux/man-pages/man2/rename.2.html > > > > Use case is atomically updating bootloader configuration on EFI System > > partitions. Some bootloader implementations have configuration files > > bigger than 512 bytes, which could possibly be torn on write. But I'm > > also not sure what write order FAT uses. > > > > 1. > > FAT32 file system is mounted at /boot/efi > > > > 2. > > # echo "hello" > /boot/efi/tmp/test.txt > > # mv /boot/efi/tmp/test.txt /boot/efi/EFI/fedora/ > > > > 3. > > When I strace the above mv command I get these lines: > > ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 > > renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD, > > "/boot/efi/EFI/fedora/", RENAME_NOREPLACE) = -1 EEXIST (File exists) > > stat("/boot/efi/EFI/fedora/", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0 > > renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD, > > "/boot/efi/EFI/fedora/test.txt", RENAME_NOREPLACE) = 0 > > lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) > > close(0) > > > > I can't tell from documentation if renameat2() with flag > > RENAME_NOREPLACE is atomic, assuming the file doesn't exist at > > destination. RENAME_NOREPLACE is atomic at VFS level, independently of used filesystem. There is no race condition when multiple processes access that directory at same time. > > 4. > > Do it again exactly as before, small change > > # echo "hello" > /boot/efi/tmp/test.txt > > # mv /boot/efi/tmp/test.txt /boot/efi/EFI/fedora/ > > > > 5. > > The strace shows fallback to rename() > > > > ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0 > > renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD, > > "/boot/efi/EFI/fedora/", RENAME_NOREPLACE) = -1 EEXIST (File exists) > > stat("/boot/efi/EFI/fedora/", {st_mode=S_IFDIR|0700, st_size=1024, ...}) = 0 > > renameat2(AT_FDCWD, "/boot/efi/tmp/test.txt", AT_FDCWD, > > "/boot/efi/EFI/fedora/test.txt", RENAME_NOREPLACE) = -1 EEXIST (File > > exists) > > lstat("/boot/efi/tmp/test.txt", {st_mode=S_IFREG|0700, st_size=7, ...}) = 0 > > newfstatat(AT_FDCWD, "/boot/efi/EFI/fedora/test.txt", > > {st_mode=S_IFREG|0700, st_size=6, ...}, AT_SYMLINK_NOFOLLOW) = 0 > > geteuid() = 0 > > rename("/boot/efi/tmp/test.txt", "/boot/efi/EFI/fedora/test.txt") = 0 > > lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) > > close(0) = 0 > > > > > > Per documentation that should be atomic. So the questions are, are > > both atomic, or neither atomice, and if not what should be used to > > ensure bootloader updates are atomic. At VFS level both are atomic independently of filesystem. > According of my understanding of FAT rename() is not atomic at all. > It can downgrade to a hardlink. i.e. rename("foo", "bar") can result in having > both "foo" and "bar." > ...or worse. Generally rename() may really cause that at some period of time both "foo" and "bar" may points to same inode. (But is this a really problem for your scenario?) But looking at vfat source code (file namei_vfat.c), both rename and lookup operation are locked by mutex, so during rename operation there should not be access to read directory and therefore race condition should not be there (which would cause reading inconsistent directory during rename operation). If you want atomic rename of two files independently of filesystem, you can use RENAME_EXCHANGE flag. It exchanges that two specified files atomically, so there would not be that race condition like in rename() that in some period of time both "foo" and "bar" would point to same inode. But... if you are asking for consistency and atomicity at filesystem level (e.g. you turn off disk / power supply during rename operation) then this is not atomic and probably it cannot be implemented. When FAT filesystem is mounted (either by Windows or Linux kernel) it is marked by "dirty" flag and later when doing unmount, "dirty" flag is cleared. This is there to ensure that operations like rename were finished and were not stopped/killed in between. So future when you read from FAT filesystem you would know if it is in consistent state or not. > Pali has probably more input to share. :-) > > > There are plausibly three kinds: > > > > A. write a new file with file name that doesn't previously exist > > B. write a new file with a new file name, then do a rename stomping on > > the old one > > C. overwrite an existing file > > > > It seems C is risky. It probably isn't atomic and can't be made to be > > atomic on FAT. Option C is really risky. Overwriting file means following operations: 1. truncate file to zero size 2. write first N blocks 3. write second N blocks ... 4. write last M blocks Option B is a common practise. IIRC also config files in KDE are updated in this way. > > > > -- > > Chris Murphy > -- Pali Rohár pali.rohar@xxxxxxxxx