Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 27, 2024 at 02:46:11PM -0800, Linus Torvalds wrote:
> On Tue, 27 Feb 2024 at 14:21, Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote:
> >
> > ext4 code doesn't do that. it takes the inode lock in exclusive mode,
> > just like everyone else.
> 
> Not for dio, it doesn't.
> 
> > > The real question is how much of userspace will that break, because
> > > of implicit assumptions that the kernel has always serialised
> > > buffered writes?
> >
> > What would break?
> 
> Well, at least in theory you could have concurrent overlapping writes
> of folio crossing records, and currently you do get the guarantee that
> one or the other record is written, but relying just on page locking
> would mean that you might get a mix of them at page boundaries.

I think we can keep that guarantee.

The tricky case was -EFAULT from copy_from_user_nofault(), where we have
to bail out, drop locks, re-fault in the user buffer - and redo the rest
of the write, this time holding the inode lock.

We can't guarantee that partial writes don't happen, but what we can do
is restart the write from the beginning, so the partial write gets
overwritten with a full atomic write.

This way after writes complete we'll never have weird torn writes left
around.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux