Re: [PATCH] xfs: fix incorrect log_flushed on fsync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 31, 2017 at 4:47 PM, Christoph Hellwig <hch@xxxxxx> wrote:
> I think something like the following patch (totally untested,
> just an idea) should fix the issue, right?

I think that is not enough.

>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index c4893e226fd8..555fcae9a18f 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -135,7 +135,7 @@ xfs_file_fsync(
>         struct xfs_inode        *ip = XFS_I(inode);
>         struct xfs_mount        *mp = ip->i_mount;
>         int                     error = 0;
> -       int                     log_flushed = 0;
> +       unsigned int            flushseq;
>         xfs_lsn_t               lsn = 0;
>
>         trace_xfs_file_fsync(ip);
> @@ -143,6 +143,7 @@ xfs_file_fsync(
>         error = file_write_and_wait_range(file, start, end);
>         if (error)
>                 return error;
> +       flushseq = READ_ONCE(mp->m_flushseq);

imagine that flush was submitted and completed before
file_write_and_wait_range() but m_flushseq incremented  after.
maybe here READ m_flush_submitted_seq...

>
>         if (XFS_FORCED_SHUTDOWN(mp))
>                 return -EIO;
> @@ -181,7 +182,7 @@ xfs_file_fsync(
>         }
>
>         if (lsn) {
> -               error = _xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, &log_flushed);
> +               error = _xfs_log_force_lsn(mp, lsn, XFS_LOG_SYNC, NULL);
>                 ip->i_itemp->ili_fsync_fields = 0;
>         }
>         xfs_iunlock(ip, XFS_ILOCK_SHARED);
> @@ -193,8 +194,9 @@ xfs_file_fsync(
>          * an already allocated file and thus do not have any metadata to
>          * commit.
>          */
> -       if (!log_flushed && !XFS_IS_REALTIME_INODE(ip) &&
> -           mp->m_logdev_targp == mp->m_ddev_targp)
> +       if (!XFS_IS_REALTIME_INODE(ip) &&
> +           mp->m_logdev_targp == mp->m_ddev_targp &&
> +           flushseq == READ_ONCE(mp->m_flushseq))

... and here READ m_flush_completed_seq
if (m_flush_completed_seq > m_flush_submitted_seq)
it is safe to skip issue flush.
Then probably READ_ONCE() is not enough and need smb_rmb?

>                 xfs_blkdev_issue_flush(mp->m_ddev_targp);
>
>         return error;
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index bcb2f860e508..3c0cbb98581e 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -2922,6 +2922,8 @@ xlog_state_done_syncing(
>                 iclog->ic_state = XLOG_STATE_DONE_SYNC;
>         }
>
> +       log->l_mp->m_flushseq++;

I recon this should use WRITE_ONCE or smp_wmb()
and then also increment m_flush_submitted_seq *before*
issueing flush

If state machine does not allow more than a single flush
to be in flight (?) then the 2 seq counters could be reduced
to single seq counter with (m_flushseq % 2) == 1 for submitted
and  (m_flushseq % 2) == 0 for completed and the test in fsync
would be (flushseq % 2) == (READ_ONCE(mp->m_flushseq) % 2)

... maybe?

Amir.



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]