On Thu, Jan 16, 2025 at 07:52:25AM +0100, Christoph Hellwig wrote: > On Tue, Jan 14, 2025 at 03:57:26PM -0800, Darrick J. Wong wrote: > > Ok, let's do that then. Just to be clear -- for any RWF_ATOMIC direct > > write that's correctly aligned and targets a single mapping in the > > correct state, we can build the untorn bio and submit it. For > > everything else, prealloc some post EOF blocks, write them there, and > > exchange-range them. > > > > Tricky questions: How do we avoid collisions between overlapping writes? > > I guess we find a free file range at the top of the file that is long > > enough to stage the write, and put it there? And purge it later? > > > > Also, does this imply that the maximum file size is less than the usual > > 8EB? > > I think literally using the exchrange code for anything but an > initial prototype is a bad idea for the above reasons. If we go > beyond proving this is possible you'd want a version of exchrange > where the exchange partners is not a file mapping, but a cow staging > record. The trouble is that the br_startoff attribute of cow staging mappings aren't persisted on disk anywhere, which is why exchange-range can't handle the cow fork. You could open an O_TMPFILE and swap between the two files, though that gets expensive per-io unless you're willing to stash that temp file somewhere. At this point I think we should slap the usual EXPERIMENTAL warning on atomic writes through xfs and let John land the simplest multi-fsblock untorn write support, which only handles the corner case where all the stars are <cough> aligned; and then make an exchange-range prototype and/or all the other forcealign stuff. (Lifting in smaller pieces sounds a lot better than having John carry around an increasingly large patchset...) --D