On Wed, 28 Feb 2024 at 10:18, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > I think we can keep that guarantee. > > The tricky case was -EFAULT from copy_from_user_nofault(), where we have > to bail out, drop locks, re-fault in the user buffer - and redo the rest > of the write, this time holding the inode lock. > > We can't guarantee that partial writes don't happen, but what we can do > is restart the write from the beginning, so the partial write gets > overwritten with a full atomic write. I think that's a solution that is actually much worse than the thing it is trying to solve. Now a concurrent reader can actually see the data change twice or more. Either because there's another writer that came in in between, or because of threaded modifications to the source buffer in the first writer. So your solution actually makes for noticeably *worse* atomicity guarantees, not better. Not the solution. Not at all. I do think the solution is to just take the inode lock exclusive (when we have to modify the inode size or the suid/sgid) or shared (to prevent concurrent i_size modifications), and leave it at that. And we should probably do a mount flag (for defaults) and an open-time flag (for specific uses) to let people opt in to this behavior. Linus