Re: [LSF/MM/BPF TOPIC] Measuring limits and enhancing buffered IO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Tue, Feb 27, 2024 at 05:21:20PM -0500, Kent Overstreet wrote:
> On Wed, Feb 28, 2024 at 09:13:05AM +1100, Dave Chinner wrote:
> > On Tue, Feb 27, 2024 at 05:07:30AM -0500, Kent Overstreet wrote:
> > > AFAIK every filesystem allows concurrent direct writes, not just xfs,
> > > it's _buffered_ writes that we care about here.
> > 
> > We could do concurrent buffered writes in XFS - we would just use
> > the same locking strategy as direct IO and fall back on folio locks
> > for copy-in exclusion like ext4 does.
> ext4 code doesn't do that. it takes the inode lock in exclusive mode,
> just like everyone else.

Uhuh. ext4 does allow concurrent DIO writes. It's just much more
constrained than XFS. See ext4_dio_write_checks().

> > The real question is how much of userspace will that break, because
> > of implicit assumptions that the kernel has always serialised
> > buffered writes?
> What would break?

Good question. If you don't know the answer, then you've got the
same problem as I have. i.e. we don't know if concurrent
applications that use buffered IO extensively (eg. postgres) assume
data coherency because of the implicit serialisation occurring
during buffered IO writes?

> > > If we do a short write because of a page fault (despite previously
> > > faulting in the userspace buffer), there is no way to completely prevent
> > > torn writes an atomicity breakage; we could at least try a trylock on
> > > the inode lock, I didn't do that here.
> > 
> > As soon as we go for concurrent writes, we give up on any concept of
> > atomicity of buffered writes (esp. w.r.t reads), so this really
> > doesn't matter at all.
> We've already given up buffered write vs. read atomicity, have for a
> long time - buffered read path takes no locks.

We still have explicit buffered read() vs buffered write() atomicity
in XFS via buffered reads taking the inode lock shared (see
xfs_file_buffered_read()) because that's what POSIX says we should

Essentially, we need to explicitly give POSIX the big finger and
state that there are no atomicity guarantees given for write() calls
of any size, nor are there any guarantees for data coherency for
any overlapping concurrent buffered IO operations.

Those are things we haven't completely given up yet w.r.t. buffered
IO, and enabling concurrent buffered writes will expose to users.
So we need to have explicit policies for this and document them
clearly in all the places that application developers might look
for behavioural hints.

Dave Chinner

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux