On Wed 22-06-22 11:41:14, Jens Axboe wrote: > Top posting - are people fine with queueing this up at this point? Will > need a bit of massaging for io_uring as certain things moved to another > file, but it's really minor. I'd do a separate topic branch for this. I have no objections to merging this. The parts I felt confident about enough look OK to me (and have my reviewed-by tag). Honza > On 6/16/22 3:22 PM, Stefan Roesch wrote: > > This patch series adds support for async buffered writes when using both > > xfs and io-uring. Currently io-uring only supports buffered writes in the > > slow path, by processing them in the io workers. With this patch series it is > > now possible to support buffered writes in the fast path. To be able to use > > the fast path the required pages must be in the page cache, the required locks > > in xfs can be granted immediately and no additional blocks need to be read > > form disk. > > > > Updating the inode can take time. An optimization has been implemented for > > the time update. Time updates will be processed in the slow path. While there > > is already a time update in process, other write requests for the same file, > > can skip the update of the modification time. > > > > > > Performance results: > > For fio the following results have been obtained with a queue depth of > > 1 and 4k block size (runtime 600 secs): > > > > sequential writes: > > without patch with patch libaio psync > > iops: 77k 209k 195K 233K > > bw: 314MB/s 854MB/s 790MB/s 953MB/s > > clat: 9600ns 120ns 540ns 3000ns > > > > > > For an io depth of 1, the new patch improves throughput by over three times > > (compared to the exiting behavior, where buffered writes are processed by an > > io-worker process) and also the latency is considerably reduced. To achieve the > > same or better performance with the exisiting code an io depth of 4 is required. > > Increasing the iodepth further does not lead to improvements. > > > > In addition the latency of buffered write operations is reduced considerably. > > > > > > > > Support for async buffered writes: > > > > To support async buffered writes the flag FMODE_BUF_WASYNC is introduced. In > > addition the check in generic_write_checks is modified to allow for async > > buffered writes that have this flag set. > > > > Changes to the iomap page create function to allow the caller to specify > > the gfp flags. Sets the IOMAP_NOWAIT flag in iomap if IOCB_NOWAIT has been set > > and specifies the requested gfp flags. > > > > Adds the iomap async buffered write support to the xfs iomap layer. > > Adds async buffered write support to the xfs iomap layer. > > > > Support for async buffered write support and inode time modification > > > > Splits the functions for checking if the file privileges need to be removed in > > two functions: check function and a function for the removal of file privileges. > > The same split is also done for the function to update the file modification time. > > > > Implement an optimization that while a file modification time is pending other > > requests for the same file don't need to wait for the file modification update. > > This avoids that a considerable number of buffered async write requests get > > punted. > > > > Take the ilock in nowait mode if async buffered writes are enabled and enable > > the async buffered writes optimization in io_uring. > > > > Support for write throttling of async buffered writes: > > > > Add a no_wait parameter to the exisiting balance_dirty_pages() function. The > > function will return -EAGAIN if the parameter is true and write throttling is > > required. > > > > Add a new function called balance_dirty_pages_ratelimited_async() that will be > > invoked from iomap_write_iter() if an async buffered write is requested. > > > > Enable async buffered write support in xfs > > This enables async buffered writes for xfs. > > > > > > Testing: > > This patch has been tested with xfstests, fsx, fio and individual test programs. > > > > > > Changes: > > V9: > > - Added comment for function balance_dirty_pages_ratelimited_flags() > > - checking return code for iop allocation in iomap_page_create() > > > > V8: > > - Reverted back changes to iomap_write_iter and used Mathew Wilcox code review > > recommendation with an additional change to revert the iterator. > > - Removed patch "fs: Optimization for concurrent file time updates" > > - Setting flag value in file_modified_flags() > > - Removed additional spaces in comment in file_update_time() > > - Run fsx with 1 billion ops against the changes (Run passed) > > > > V7: > > - Change definition and if clause in " iomap: Add flags parameter to > > iomap_page_create()" > > - Added patch "iomap: Return error code from iomap_write_iter()" to address > > the problem Dave Chinner brought up: retrying memory allocation a second > > time when we are under memory pressure. > > - Removed patch "xfs: Change function signature of xfs_ilock_iocb()" > > - Merged patch "xfs: Enable async buffered write support" with previous > > patch > > > > V6: > > - Pass in iter->flags to calls in iomap_page_create() > > > > V5: > > - Refreshed to 5.19-rc1 > > - Merged patch 3 and patch 4 > > "mm: Prepare balance_dirty_pages() for async buffered writes" and > > "mm: Add balance_dirty_pages_ratelimited_flags() function" > > - Reformatting long file in iomap_page_create() > > - Replacing gfp parameter with flags parameter in iomap_page_create() > > This makes sure that the gfp setting is done in one location. > > - Moved variable definition outside of loop in iomap_write_iter() > > - Merged patch 7 with patch 6. > > - Introduced __file_remove_privs() that get the iocb_flags passed in > > as an additional parameter > > - Removed file_needs_remove_privs() function > > - Renamed file_needs_update_time() inode_needs_update_time() > > - inode_needs_update_time() no longer passes the file pointer > > - Renamed file_modified_async() to file_modified_flags() > > - Made file_modified_flags() an internal function > > - Removed extern keyword in file_modified_async definition > > - Added kiocb_modified function. > > - Separate patch for changes to xfs_ilock_for_iomap() > > - Separate patch for changes to xfs_ilock_inode() > > - Renamed xfs_ilock_xfs_inode()n back to xfs_ilock_iocb() > > - Renamed flags parameter to iocb_flags in function xfs_ilock_iocb() > > - Used inode_set_flags() to manipulate inode flags in the function > > file_modified_flags() > > > > V4: > > - Reformat new code in generic_write_checks_count(). > > - Removed patch that introduced new function iomap_page_create_gfp(). > > - Add gfp parameter to iomap_page_create() and change all callers > > All users will enforce the number of blocks check > > - Removed confusing statement in iomap async buffer support patch > > - Replace no_wait variable in __iomap_write_begin with check of > > IOMAP_NOWAIT for easier readability. > > - Moved else if clause in __iomap_write_begin into else clause for > > easier readability > > - Removed the balance_dirty_pages_ratelimited_async() function and > > reverted back to the earlier version that used the function > > balance_dirty_pages_ratelimited_flags() > > - Introduced the flag BDP_ASYNC. > > - Renamed variable in iomap_write_iter from i_mapping to mapping. > > - Directly call balance_dirty_pages_ratelimited_flags() in the function > > iomap_write_iter(). > > - Re-ordered the patches. > > > > V3: > > - Reformat new code in generic_write_checks_count() to line lengthof 80. > > - Remove if condition in __iomap_write_begin to maintain current behavior. > > - use GFP_NOWAIT flag in __iomap_write_begin > > - rename need_file_remove_privs() function to file_needs_remove_privs() > > - rename do_file_remove_privs to __file_remove_privs() > > - add kernel documentation to file_remove_privs() function > > - rework else if branch in file_remove_privs() function > > - add kernel documentation to file_modified() function > > - add kernel documentation to file_modified_async() function > > - rename err variable in file_update_time to ret > > - rename function need_file_update_time() to file_needs_update_time() > > - rename function do_file_update_time() to __file_update_time() > > - don't move check for FMODE_NOCMTIME in generic helper > > - reformat __file_update_time for easier reading > > - add kernel documentation to file_update_time() function > > - fix if in file_update_time from < to <= > > - move modification of inode flags from do_file_update_time to file_modified() > > When this function is called, the caller must hold the inode lock. > > - 3 new patches from Jan to add new no_wait flag to balance_dirty_pages(), > > remove patch 12 from previous series > > - Make balance_dirty_pages_ratelimited_flags() a static function > > - Add new balance_dirty_pages_ratelimited_async() function > > > > V2: > > - Remove atomic allocation > > - Use direct write in xfs_buffered_write_iomap_begin() > > - Use xfs_ilock_for_iomap() in xfs_buffered_write_iomap_begin() > > - Remove no_wait check at the end of xfs_buffered_write_iomap_begin() for > > the COW path. > > - Pass xfs_inode pointer to xfs_ilock_iocb and rename function to > > xfs_lock_xfs_inode > > - Replace existing uses of xfs_ilock_iocb with xfs_ilock_xfs_inode > > - Use xfs_ilock_xfs_inode in xfs_file_buffered_write() > > - Callers of xfs_ilock_for_iomap need to initialize lock mode. This is > > required so writes use an exclusive lock > > - Split of _balance_dirty_pages() from balance_dirty_pages() and return > > sleep time > > - Call _balance_dirty_pages() in balance_dirty_pages_ratelimited_flags() > > - Move call to balance_dirty_pages_ratelimited_flags() in iomap_write_iter() > > to the beginning of the loop > > > > > > > > Jan Kara (3): > > mm: Move starting of background writeback into the main balancing loop > > mm: Move updates of dirty_exceeded into one place > > mm: Add balance_dirty_pages_ratelimited_flags() function > > > > Stefan Roesch (11): > > iomap: Add flags parameter to iomap_page_create() > > iomap: Add async buffered write support > > iomap: Return -EAGAIN from iomap_write_iter() > > fs: Add check for async buffered writes to generic_write_checks > > fs: add __remove_file_privs() with flags parameter > > fs: Split off inode_needs_update_time and __file_update_time > > fs: Add async write file modification handling. > > io_uring: Add support for async buffered writes > > io_uring: Add tracepoint for short writes > > xfs: Specify lockmode when calling xfs_ilock_for_iomap() > > xfs: Add async buffered write support > > > > fs/inode.c | 168 +++++++++++++++++++++++--------- > > fs/io_uring.c | 32 +++++- > > fs/iomap/buffered-io.c | 71 +++++++++++--- > > fs/read_write.c | 4 +- > > fs/xfs/xfs_file.c | 11 +-- > > fs/xfs/xfs_iomap.c | 11 ++- > > include/linux/fs.h | 4 + > > include/linux/writeback.h | 7 ++ > > include/trace/events/io_uring.h | 25 +++++ > > mm/page-writeback.c | 89 +++++++++++------ > > 10 files changed, 314 insertions(+), 108 deletions(-) > > > > > > base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 > > > -- > Jens Axboe > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR