Re: [PATCH v4] fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 17, 2019 at 08:45:59AM +0300, Amir Goldstein wrote:
> Commit 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
> writeback") claims that sync_file_range(2) syscall was "created for
> userspace to be able to issue background writeout and so waiting for
> in-flight IO is undesirable there" and changes the writeback (back) to
> WB_SYNC_NONE.
> 
> This claim is only partially true. It is true for users that use the flag
> SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was
> the reason for changing to WB_SYNC_NONE writeback.
> 
> However, that claim is not true for users that use that flag combination
> SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}.
> Those users explicitly requested to wait for in-flight IO as well as to
> writeback of dirty pages.
> 
> Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT
> and use the helper filemap_write_and_wait_range(), that uses WB_SYNC_ALL
> writeback, to perform the full range sync request.
> 
> Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@xxxxxxxxx
> Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
> Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx>
> Acked-by: Jan Kara <jack@xxxxxxxx>
> Cc: Dave Chinner <david@xxxxxxxxxxxxx>
> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> ---
> 
> Andrew,
> 
> V2 of this patch is on your mmtotm queue.
> However, I had already sent out V3 with a braino fix and Dave Chinner
> just added more review comments which I had addressed in this version.
> 
> Thanks,
> Amir.
> 
> Changes since v3:
> - Remove unneeded change to VALID_FLAGS (Dave)
> - Call file_fdatawait_range() before writeback (Dave)
> 
> Changes since v2:
> - Return after filemap_write_and_wait_range()
> 
> Changes since v1:
> - Remove non-guaranties of the API from commit message
> - Added ACK by Jan
> 
>  fs/sync.c               | 20 +++++++++++++++-----
>  include/uapi/linux/fs.h |  3 +++
>  2 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/sync.c b/fs/sync.c
> index b54e0541ad89..1836328f1ae8 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -235,9 +235,9 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
>  }
>  
>  /*
> - * sys_sync_file_range() permits finely controlled syncing over a segment of
> + * ksys_sync_file_range() permits finely controlled syncing over a segment of
>   * a file in the range offset .. (offset+nbytes-1) inclusive.  If nbytes is
> - * zero then sys_sync_file_range() will operate from offset out to EOF.
> + * zero then ksys_sync_file_range() will operate from offset out to EOF.
>   *
>   * The flag bits are:
>   *
> @@ -254,7 +254,7 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
>   * Useful combinations of the flag bits are:
>   *
>   * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages
> - * in the range which were dirty on entry to sys_sync_file_range() are placed
> + * in the range which were dirty on entry to ksys_sync_file_range() are placed
>   * under writeout.  This is a start-write-for-data-integrity operation.
>   *
>   * SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which
> @@ -266,10 +266,13 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, fd)
>   * earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait
>   * for that operation to complete and to return the result.
>   *
> - * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER:
> + * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER
> + * (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT):
>   * a traditional sync() operation.  This is a write-for-data-integrity operation
>   * which will ensure that all pages in the range which were dirty on entry to
> - * sys_sync_file_range() are committed to disk.
> + * ksys_sync_file_range() are written to disk.  It should be noted that disk
> + * caches are not flushed by this call, so there are no guarantees here that the
> + * data will be available on disk after a crash.
>   *
>   *
>   * SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any
> @@ -344,6 +347,13 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
>  			goto out_put;
>  	}
>  
> +	if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) ==
> +		     SYNC_FILE_RANGE_WRITE_AND_WAIT) {
> +		/* Unlike SYNC_FILE_RANGE_WRITE alone, uses WB_SYNC_ALL */
> +		ret = filemap_write_and_wait_range(mapping, offset, endbyte);
> +		goto out_put;
> +	}

Clunky, now that I look at it in context.

+	int	sync_mode = WB_SYNC_NONE;
+
+	if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) ==
+		     SYNC_FILE_RANGE_WRITE_AND_WAIT)
+		sync_mode = WB_SYNC_ALL;

.....

	if (flags & SYNC_FILE_RANGE_WRITE) {
		ret = __filemap_fdatawrite_range(mapping, offset, endbyte,
-						 WB_SYNC_NONE);
+						 sync_mode);

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux