The patch titled Subject: fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback has been added to the -mm tree. Its filename is fs-syncc-sync_file_range2-may-use-wb_sync_all-writeback.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/fs-syncc-sync_file_range2-may-use-wb_sync_all-writeback.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/fs-syncc-sync_file_range2-may-use-wb_sync_all-writeback.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: Amir Goldstein <amir73il@xxxxxxxxx> Subject: fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback") claims that sync_file_range(2) syscall was "created for userspace to be able to issue background writeout and so waiting for in-flight IO is undesirable there" and changes the writeback (back) to WB_SYNC_NONE. This claim is only partially true. It is true for users that use the flag SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was the reason for changing to WB_SYNC_NONE writeback. However, that claim is not true for users that use that flag combination SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}. Those users explicitly requested to wait for in-flight IO as well as to writeback of dirty pages. Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT and use the helper filemap_write_and_wait_range(), that uses WB_SYNC_ALL writeback, to perform the full range sync request. Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@xxxxxxxxx Link: http://lkml.kernel.org/r/20190411112152.32151-1-amir73il@xxxxxxxxx Fixes: 23d0127096cb ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE") Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> Acked-by: Jan Kara <jack@xxxxxxxx> Cc: Dave Chinner <david@xxxxxxxxxxxxx> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- fs/sync.c | 25 ++++++++++++++++++------- include/uapi/linux/fs.h | 3 +++ 2 files changed, 21 insertions(+), 7 deletions(-) --- a/fs/sync.c~fs-syncc-sync_file_range2-may-use-wb_sync_all-writeback +++ a/fs/sync.c @@ -18,8 +18,8 @@ #include <linux/backing-dev.h> #include "internal.h" -#define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ - SYNC_FILE_RANGE_WAIT_AFTER) +#define VALID_FLAGS (SYNC_FILE_RANGE_WRITE | SYNC_FILE_RANGE_WRITE_AND_WAIT | \ + SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WAIT_AFTER) /* * Do the filesystem syncing work. For simple filesystems @@ -235,9 +235,9 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, } /* - * sys_sync_file_range() permits finely controlled syncing over a segment of + * ksys_sync_file_range() permits finely controlled syncing over a segment of * a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is - * zero then sys_sync_file_range() will operate from offset out to EOF. + * zero then ksys_sync_file_range() will operate from offset out to EOF. * * The flag bits are: * @@ -254,7 +254,7 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, * Useful combinations of the flag bits are: * * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages - * in the range which were dirty on entry to sys_sync_file_range() are placed + * in the range which were dirty on entry to ksys_sync_file_range() are placed * under writeout. This is a start-write-for-data-integrity operation. * * SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which @@ -266,10 +266,13 @@ SYSCALL_DEFINE1(fdatasync, unsigned int, * earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait * for that operation to complete and to return the result. * - * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER: + * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER + * (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT): * a traditional sync() operation. This is a write-for-data-integrity operation * which will ensure that all pages in the range which were dirty on entry to - * sys_sync_file_range() are committed to disk. + * ksys_sync_file_range() are written to disk. It should be noted that disk + * caches are not flushed by this call, so there are no guarantees here that the + * data will be available on disk after a crash. * * * SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any @@ -338,6 +341,14 @@ int ksys_sync_file_range(int fd, loff_t mapping = f.file->f_mapping; ret = 0; + if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) == + SYNC_FILE_RANGE_WRITE_AND_WAIT) { + /* Unlike SYNC_FILE_RANGE_WRITE alone uses WB_SYNC_ALL */ + ret = filemap_write_and_wait_range(mapping, offset, endbyte); + if (ret < 0) + goto out_put; + } + if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { ret = file_fdatawait_range(f.file, offset, endbyte); if (ret < 0) --- a/include/uapi/linux/fs.h~fs-syncc-sync_file_range2-may-use-wb_sync_all-writeback +++ a/include/uapi/linux/fs.h @@ -320,6 +320,9 @@ struct fscrypt_key { #define SYNC_FILE_RANGE_WAIT_BEFORE 1 #define SYNC_FILE_RANGE_WRITE 2 #define SYNC_FILE_RANGE_WAIT_AFTER 4 +#define SYNC_FILE_RANGE_WRITE_AND_WAIT (SYNC_FILE_RANGE_WRITE | \ + SYNC_FILE_RANGE_WAIT_BEFORE | \ + SYNC_FILE_RANGE_WAIT_AFTER) /* * Flags for preadv2/pwritev2: _ Patches currently in -mm which might be from amir73il@xxxxxxxxx are fs-syncc-sync_file_range2-may-use-wb_sync_all-writeback.patch