Re: [PATCHSET v29.4 03/13] xfs: atomic file content exchanges

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 29, 2024 at 05:43:21PM -0500, Colin Walters wrote:
> 
> 
> On Thu, Feb 29, 2024, at 3:18 PM, Darrick J. Wong wrote:
> >
> > Correct, there's no built-in dedupe.  For small files you'll probably
> > end up with a single allocation anyway, which is ideal in terms of
> > ondisk metadata overhead.
> 
> Makes sense.
> 
> > though I would bet that extending linkat (or rename, or
> > whatever) is going to be the only workable solution for old / simple
> > filesystems (e.g. fat32).
> 
> Ah, right; that too.
> 
> > How /does/ dconf handle those changes?  Does it rename the file and
> > signal all the other dconf threads to reopen the file?  And then those
> > threads get the new file contents?
> 
> I briefly skimmed the code and couldn't find it, but yes I believe
> it's basically that clients have an inotify watch that gets handled
> from the mainloop and clients close and reopen and re-mmap - it's
> probably nonexistent to have non-mainloop threads reading things from
> the mmap, so there's no races with any other threads.

Hrmm.  IIRC inotify and fanotify both use the same fsnotify backend.
fsnotify events are emitted after i_rwsem drops, which (if I read
read_write.c correctly) means this is technically racy.

That said if they're mostly waiting around in the inotify loop then it
probably doesn't matter.

> > Huurrrh hurrrh.  That's right, I don't see how exchange can mesh well
> > with mmap without actual flock()ing. :(
> >
> > fsnotify will send a message out to userspace after the exchange
> > finishes, which means that userspace could watch for the notifications
> > via fanotify.  However, that's still a bit racy... :/
> 
> Right.  However...it's not just about mmap.  Sorry this is a minor
> rant but...near my top ten list of changes to make with a time machine
> for Unix would be the concept of a contents-immutable file, like all
> the seals that work on memfd with F_ADD_SEALS (and outside of
> fsverity, which is good but can be a bit of a heavier hammer).

You and me both. :)

Also I want a persistent file contents write counter; and a
file-anything write counter.

Oh, and a conditional read where you pass in the file contents write
counter and returns an error if the file has been changed since sampling
time.  The changecookie thing mentioned elsewhere gets us towards that,
if onlty the issues w/ XFS get resolved.

> A few times I've been working on shell script in my editor on my
> desktop, and these shell scripts are tests because shell script is so
> tempting.  I'm sure this familiar, given (x)fstests.
> 
> And if you just run the tests (directly from source in git), and then
> notice a bug, and start typing in your editor, save the changes, and
> then and your editor happens to do a generic "open(O_TRUNC), save"
> instead of an atomic rename.  This happens to be what `nano` and
> VSCode do, although at least the `vi` I have here does an atomic
> rename.  (One could say all editors that don't are broken...but...)

I think they do O_TRUNC because it saves them from having to copy the
file attrs and xattrs.  Too bad it severely screws up a program running
in another terminal that just happens to hit the zero-byte file.

> And now because the way bash works (and I assume other historical Unix
> shells) is that they interpret the file *as they're reading it* in
> this scenario you can get completely undefined behavior.  It could do
> *anything*.
> 
> At least one of those times, I got an error from an `rm -rf`
> invocation that happened to live in one of those test scripts...that
> could have in theory just gone off and removed anything.
> 
> Basically the contents-immutable is really what you *always* want for
> executables and really anything that can be parsed without locking
> (like, almost all config files in /etc too).  With ELF files there's

Yes.

> EXTBUSY if it *happens* to be in use, but that's just a hack.  Also in

A hack that doesn't work for scripts.  Either interpreters have to read
the entire script into memory before execution, or I guess they can do
the insane thing that the DOS batch interpreter did, where before each
statement it would save the file pos, close it, execute the command,
reopen the batch file, and seek back to that line.

> that other thread about racing writes to suid executables...well,
> there'd be no possibility for races if we just denied writing because
> again - it makes no sense to just make random writes in-place to an
> executable.  (OK I did see the zig folks are trying an incremental
> linker, but still I would just assume reflinks are available for that)
> 
> Now this is relevant here because, I don't think anything like
> dpkg/rpm and all those things could ever use this ioctl for this
> reason.

Right.  dpkg executable file replacement really doesn't make much sense
for exchange range.  That's also wasn't the usecase I was targetting
though admittedly I'm only using this ioctl to test functionality that
online fsck requires.

> So, it seems to me like it should really be more explicitly targeted at
> - Things that are using open()+write() today and it's safe for that use case
> - The database cases
> 
> And not talk about replacing the general open(O_TMPFILE) + rename() path.

I think I'll change the cover letter to talk about what it does, what
problems it solves, and what problems it introduces.  Figuring out how
to take advantage of it is an exercise for application writers.

--D




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux