On Tue, Feb 27, 2024 at 05:21:20PM -0500, Kent Overstreet wrote: > On Wed, Feb 28, 2024 at 09:13:05AM +1100, Dave Chinner wrote: > > On Tue, Feb 27, 2024 at 05:07:30AM -0500, Kent Overstreet wrote: > > > AFAIK every filesystem allows concurrent direct writes, not just xfs, > > > it's _buffered_ writes that we care about here. > > > > We could do concurrent buffered writes in XFS - we would just use > > the same locking strategy as direct IO and fall back on folio locks > > for copy-in exclusion like ext4 does. > > ext4 code doesn't do that. it takes the inode lock in exclusive mode, > just like everyone else. Uhuh. ext4 does allow concurrent DIO writes. It's just much more constrained than XFS. See ext4_dio_write_checks(). > > The real question is how much of userspace will that break, because > > of implicit assumptions that the kernel has always serialised > > buffered writes? > > What would break? Good question. If you don't know the answer, then you've got the same problem as I have. i.e. we don't know if concurrent applications that use buffered IO extensively (eg. postgres) assume data coherency because of the implicit serialisation occurring during buffered IO writes? > > > If we do a short write because of a page fault (despite previously > > > faulting in the userspace buffer), there is no way to completely prevent > > > torn writes an atomicity breakage; we could at least try a trylock on > > > the inode lock, I didn't do that here. > > > > As soon as we go for concurrent writes, we give up on any concept of > > atomicity of buffered writes (esp. w.r.t reads), so this really > > doesn't matter at all. > > We've already given up buffered write vs. read atomicity, have for a > long time - buffered read path takes no locks. We still have explicit buffered read() vs buffered write() atomicity in XFS via buffered reads taking the inode lock shared (see xfs_file_buffered_read()) because that's what POSIX says we should have. Essentially, we need to explicitly give POSIX the big finger and state that there are no atomicity guarantees given for write() calls of any size, nor are there any guarantees for data coherency for any overlapping concurrent buffered IO operations. Those are things we haven't completely given up yet w.r.t. buffered IO, and enabling concurrent buffered writes will expose to users. So we need to have explicit policies for this and document them clearly in all the places that application developers might look for behavioural hints. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx