Re: [PATCH] xfs: Remove i_rwsem lock in buffered read

Amir Goldstein <amir73il@xxxxxxxxx> · Thu, 9 Jan 2025 11:25:55 +0100

> > One more thing I should mention.
> > You do not need to wait for atomic large writes patches to land.
> > There is nothing stopping you from implementing the suggested
> > solution based on the xfs code already in master (v6.13-rc1),
> > which has support for the RWF_ATOMIC flag for writes.

Only I missed the fact that there is not yet a plan to support
atomic buffered writes :-/

> >
> > It just means that the API will not be usable for applications that
> > want to do IO larger than block size, but concurrent read/write
>                                ^
> To be precise, this is the page size, not the block size, right?
>

fs block size:

        if (iocb->ki_flags & IOCB_ATOMIC) {
                /*
                 * Currently only atomic writing of a single FS block is
                 * supported. It would be possible to atomic write smaller than
                 * a FS block, but there is no requirement to support this.
                 * Note that iomap also does not support this yet.
                 */
                if (ocount != ip->i_mount->m_sb.sb_blocksize)
                        return -EINVAL;
                ret = generic_atomic_write_valid(iocb, from);
                if (ret)
                        return ret;
        }

> > performance of 4K IO could be improved already.
>
> Great, which means that IO operations aligned within a single page
> can be executed concurrently, because the folio lock already
> provides atomicity guarantees.
>
> If the write does not exceed the boundary of a page, we can
> downgrade the iolock to XFS_IOLOCK_SHARED. It seems to be safe
> and will not change the current behavior.
>
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -454,6 +454,11 @@ xfs_file_write_checks(
>          if (error)
>                  return error;
>
> +       if ( iocb->ki_pos >> PAGE_SHIFT == (iocb->ki_pos + count) >>
> PAGE_SHIFT) {
> +               *iolock = XFS_IOLOCK_SHARED;
> +       }
> +
>          /*
>           * For changing security info in file_remove_privs() we need
> i_rwsem
>           * exclusively.
>

I think that may be possible, but you should do it in the buffered write
code as the patch below.
xfs_file_write_checks() is called from code paths like
xfs_file_dio_write_unaligned() where you should not demote to shared lock.

> >
> > It's possible that all you need to do is:
> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index c488ae26b23d0..2542f15496488 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -777,9 +777,10 @@ xfs_file_buffered_write(
> >          ssize_t                 ret;
> >          bool                    cleared_space = false;
> >          unsigned int            iolock;
> > +       bool                    atomic_write = iocb->ki_flags & IOCB_ATOMIC;
> >
> >   write_retry:
> > -       iolock = XFS_IOLOCK_EXCL;
> > +       iolock = atomic_write ? XFS_IOLOCK_SHARED : XFS_IOLOCK_EXCL;
> >          ret = xfs_ilock_iocb(iocb, iolock);
> > --
> >
> > xfs_file_write_checks() afterwards already takes care of promoting
> > XFS_IOLOCK_SHARED to XFS_IOLOCK_EXCL for extending writes.
>
> Yeah, for writes that exceed the PAGE boundary, we can also promote
> the lock to XFS_IOLOCK_EXCL. Otherwise, I am concerned that it may
> lead to old data being retained in the file.
>
> For example, two processes writing four pages of data to the same area.
>
> process A           process B
> --------------------------------
> write AA--
> <sleep>
>                      new write BBBB
> write --AA
>
> The final data is BBAA.
>

What is the use case for which you are trying to fix performance?
Is it a use case with single block IO? if not then it does not help to implement
a partial solution for single block size IO.

Thanks,
Amir.