Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Wed, 28 Feb 2024 11:09:53 -0800

On Wed, 28 Feb 2024 at 10:18, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
>
> I think we can keep that guarantee.
>
> The tricky case was -EFAULT from copy_from_user_nofault(), where we have
> to bail out, drop locks, re-fault in the user buffer - and redo the rest
> of the write, this time holding the inode lock.
>
> We can't guarantee that partial writes don't happen, but what we can do
> is restart the write from the beginning, so the partial write gets
> overwritten with a full atomic write.

I think that's a solution that is actually much worse than the thing
it is trying to solve.

Now a concurrent reader can actually see the data change twice or
more. Either because there's another writer that came in in between,
or because of threaded modifications to the source buffer in the first
writer.

So your solution actually makes for noticeably *worse* atomicity
guarantees, not better.

Not the solution. Not at all.

I do think the solution is to just take the inode lock exclusive (when
we have to modify the inode size or the suid/sgid) or shared (to
prevent concurrent i_size modifications), and leave it at that.

And  we should probably do a mount flag (for defaults) and an
open-time flag (for specific uses) to let people opt in to this
behavior.

            Linus