Re: [PATCHSET v29.4 03/13] xfs: atomic file content exchanges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, Feb 29, 2024, at 3:18 PM, Darrick J. Wong wrote:
>
> Correct, there's no built-in dedupe.  For small files you'll probably
> end up with a single allocation anyway, which is ideal in terms of
> ondisk metadata overhead.

Makes sense.

> though I would bet that extending linkat (or rename, or
> whatever) is going to be the only workable solution for old / simple
> filesystems (e.g. fat32).

Ah, right; that too.

> How /does/ dconf handle those changes?  Does it rename the file and
> signal all the other dconf threads to reopen the file?  And then those
> threads get the new file contents?

I briefly skimmed the code and couldn't find it, but yes I believe it's basically that clients have an inotify watch that gets handled from the mainloop and clients close and reopen and re-mmap - it's probably nonexistent to have non-mainloop threads reading things from the mmap, so there's no races with any other threads.

>
> Huurrrh hurrrh.  That's right, I don't see how exchange can mesh well
> with mmap without actual flock()ing. :(
>
> fsnotify will send a message out to userspace after the exchange
> finishes, which means that userspace could watch for the notifications
> via fanotify.  However, that's still a bit racy... :/

Right.  However...it's not just about mmap.  Sorry this is a minor rant but...near my top ten list of changes to make with a time machine for Unix would be the concept of a contents-immutable file, like all the seals that work on memfd with F_ADD_SEALS (and outside of fsverity, which is good but can be a bit of a heavier hammer).

A few times I've been working on shell script in my editor on my desktop, and these shell scripts are tests because shell script is so tempting.  I'm sure this familiar, given (x)fstests.

And if you just run the tests (directly from source in git), and then notice a bug, and start typing in your editor, save the changes, and then and your editor happens to do a generic "open(O_TRUNC), save" instead of an atomic rename.  This happens to be what `nano` and VSCode do, although at least the `vi` I have here does an atomic rename.  (One could say all editors that don't are broken...but...)

And now because the way bash works (and I assume other historical Unix shells) is that they interpret the file *as they're reading it* in this scenario you can get completely undefined behavior.  It could do *anything*.

At least one of those times, I got an error from an `rm -rf` invocation that happened to live in one of those test scripts...that could have in theory just gone off and removed anything.

Basically the contents-immutable is really what you *always* want for executables and really anything that can be parsed without locking (like, almost all config files in /etc too).  With ELF files there's EXTBUSY if it *happens* to be in use, but that's just a hack.  Also in that other thread about racing writes to suid executables...well, there'd be no possibility for races if we just denied writing because again - it makes no sense to just make random writes in-place to an executable.  (OK I did see the zig folks are trying an incremental linker, but still I would just assume reflinks are available for that)

Now this is relevant here because, I don't think anything like dpkg/rpm and all those things could ever use this ioctl for this reason.

So, it seems to me like it should really be more explicitly targeted at
- Things that are using open()+write() today and it's safe for that use case
- The database cases

And not talk about replacing the general open(O_TMPFILE) + rename() path.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux