On Tue, Feb 27, 2024 at 02:46:11PM -0800, Linus Torvalds wrote: > On Tue, 27 Feb 2024 at 14:21, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > > > ext4 code doesn't do that. it takes the inode lock in exclusive mode, > > just like everyone else. > > Not for dio, it doesn't. > > > > The real question is how much of userspace will that break, because > > > of implicit assumptions that the kernel has always serialised > > > buffered writes? > > > > What would break? > > Well, at least in theory you could have concurrent overlapping writes > of folio crossing records, and currently you do get the guarantee that > one or the other record is written, but relying just on page locking > would mean that you might get a mix of them at page boundaries. I think we can keep that guarantee. The tricky case was -EFAULT from copy_from_user_nofault(), where we have to bail out, drop locks, re-fault in the user buffer - and redo the rest of the write, this time holding the inode lock. We can't guarantee that partial writes don't happen, but what we can do is restart the write from the beginning, so the partial write gets overwritten with a full atomic write. This way after writes complete we'll never have weird torn writes left around.