> > One more thing I should mention. > > You do not need to wait for atomic large writes patches to land. > > There is nothing stopping you from implementing the suggested > > solution based on the xfs code already in master (v6.13-rc1), > > which has support for the RWF_ATOMIC flag for writes. Only I missed the fact that there is not yet a plan to support atomic buffered writes :-/ > > > > It just means that the API will not be usable for applications that > > want to do IO larger than block size, but concurrent read/write > ^ > To be precise, this is the page size, not the block size, right? > fs block size: if (iocb->ki_flags & IOCB_ATOMIC) { /* * Currently only atomic writing of a single FS block is * supported. It would be possible to atomic write smaller than * a FS block, but there is no requirement to support this. * Note that iomap also does not support this yet. */ if (ocount != ip->i_mount->m_sb.sb_blocksize) return -EINVAL; ret = generic_atomic_write_valid(iocb, from); if (ret) return ret; } > > performance of 4K IO could be improved already. > > Great, which means that IO operations aligned within a single page > can be executed concurrently, because the folio lock already > provides atomicity guarantees. > > If the write does not exceed the boundary of a page, we can > downgrade the iolock to XFS_IOLOCK_SHARED. It seems to be safe > and will not change the current behavior. > > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -454,6 +454,11 @@ xfs_file_write_checks( > if (error) > return error; > > + if ( iocb->ki_pos >> PAGE_SHIFT == (iocb->ki_pos + count) >> > PAGE_SHIFT) { > + *iolock = XFS_IOLOCK_SHARED; > + } > + > /* > * For changing security info in file_remove_privs() we need > i_rwsem > * exclusively. > I think that may be possible, but you should do it in the buffered write code as the patch below. xfs_file_write_checks() is called from code paths like xfs_file_dio_write_unaligned() where you should not demote to shared lock. > > > > It's possible that all you need to do is: > > > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > > index c488ae26b23d0..2542f15496488 100644 > > --- a/fs/xfs/xfs_file.c > > +++ b/fs/xfs/xfs_file.c > > @@ -777,9 +777,10 @@ xfs_file_buffered_write( > > ssize_t ret; > > bool cleared_space = false; > > unsigned int iolock; > > + bool atomic_write = iocb->ki_flags & IOCB_ATOMIC; > > > > write_retry: > > - iolock = XFS_IOLOCK_EXCL; > > + iolock = atomic_write ? XFS_IOLOCK_SHARED : XFS_IOLOCK_EXCL; > > ret = xfs_ilock_iocb(iocb, iolock); > > -- > > > > xfs_file_write_checks() afterwards already takes care of promoting > > XFS_IOLOCK_SHARED to XFS_IOLOCK_EXCL for extending writes. > > Yeah, for writes that exceed the PAGE boundary, we can also promote > the lock to XFS_IOLOCK_EXCL. Otherwise, I am concerned that it may > lead to old data being retained in the file. > > For example, two processes writing four pages of data to the same area. > > process A process B > -------------------------------- > write AA-- > <sleep> > new write BBBB > write --AA > > The final data is BBAA. > What is the use case for which you are trying to fix performance? Is it a use case with single block IO? if not then it does not help to implement a partial solution for single block size IO. Thanks, Amir.