Top posting - are people fine with queueing this up at this point? Will need a bit of massaging for io_uring as certain things moved to another file, but it's really minor. I'd do a separate topic branch for this. On 6/16/22 3:22 PM, Stefan Roesch wrote: > This patch series adds support for async buffered writes when using both > xfs and io-uring. Currently io-uring only supports buffered writes in the > slow path, by processing them in the io workers. With this patch series it is > now possible to support buffered writes in the fast path. To be able to use > the fast path the required pages must be in the page cache, the required locks > in xfs can be granted immediately and no additional blocks need to be read > form disk. > > Updating the inode can take time. An optimization has been implemented for > the time update. Time updates will be processed in the slow path. While there > is already a time update in process, other write requests for the same file, > can skip the update of the modification time. > > > Performance results: > For fio the following results have been obtained with a queue depth of > 1 and 4k block size (runtime 600 secs): > > sequential writes: > without patch with patch libaio psync > iops: 77k 209k 195K 233K > bw: 314MB/s 854MB/s 790MB/s 953MB/s > clat: 9600ns 120ns 540ns 3000ns > > > For an io depth of 1, the new patch improves throughput by over three times > (compared to the exiting behavior, where buffered writes are processed by an > io-worker process) and also the latency is considerably reduced. To achieve the > same or better performance with the exisiting code an io depth of 4 is required. > Increasing the iodepth further does not lead to improvements. > > In addition the latency of buffered write operations is reduced considerably. > > > > Support for async buffered writes: > > To support async buffered writes the flag FMODE_BUF_WASYNC is introduced. In > addition the check in generic_write_checks is modified to allow for async > buffered writes that have this flag set. > > Changes to the iomap page create function to allow the caller to specify > the gfp flags. Sets the IOMAP_NOWAIT flag in iomap if IOCB_NOWAIT has been set > and specifies the requested gfp flags. > > Adds the iomap async buffered write support to the xfs iomap layer. > Adds async buffered write support to the xfs iomap layer. > > Support for async buffered write support and inode time modification > > Splits the functions for checking if the file privileges need to be removed in > two functions: check function and a function for the removal of file privileges. > The same split is also done for the function to update the file modification time. > > Implement an optimization that while a file modification time is pending other > requests for the same file don't need to wait for the file modification update. > This avoids that a considerable number of buffered async write requests get > punted. > > Take the ilock in nowait mode if async buffered writes are enabled and enable > the async buffered writes optimization in io_uring. > > Support for write throttling of async buffered writes: > > Add a no_wait parameter to the exisiting balance_dirty_pages() function. The > function will return -EAGAIN if the parameter is true and write throttling is > required. > > Add a new function called balance_dirty_pages_ratelimited_async() that will be > invoked from iomap_write_iter() if an async buffered write is requested. > > Enable async buffered write support in xfs > This enables async buffered writes for xfs. > > > Testing: > This patch has been tested with xfstests, fsx, fio and individual test programs. > > > Changes: > V9: > - Added comment for function balance_dirty_pages_ratelimited_flags() > - checking return code for iop allocation in iomap_page_create() > > V8: > - Reverted back changes to iomap_write_iter and used Mathew Wilcox code review > recommendation with an additional change to revert the iterator. > - Removed patch "fs: Optimization for concurrent file time updates" > - Setting flag value in file_modified_flags() > - Removed additional spaces in comment in file_update_time() > - Run fsx with 1 billion ops against the changes (Run passed) > > V7: > - Change definition and if clause in " iomap: Add flags parameter to > iomap_page_create()" > - Added patch "iomap: Return error code from iomap_write_iter()" to address > the problem Dave Chinner brought up: retrying memory allocation a second > time when we are under memory pressure. > - Removed patch "xfs: Change function signature of xfs_ilock_iocb()" > - Merged patch "xfs: Enable async buffered write support" with previous > patch > > V6: > - Pass in iter->flags to calls in iomap_page_create() > > V5: > - Refreshed to 5.19-rc1 > - Merged patch 3 and patch 4 > "mm: Prepare balance_dirty_pages() for async buffered writes" and > "mm: Add balance_dirty_pages_ratelimited_flags() function" > - Reformatting long file in iomap_page_create() > - Replacing gfp parameter with flags parameter in iomap_page_create() > This makes sure that the gfp setting is done in one location. > - Moved variable definition outside of loop in iomap_write_iter() > - Merged patch 7 with patch 6. > - Introduced __file_remove_privs() that get the iocb_flags passed in > as an additional parameter > - Removed file_needs_remove_privs() function > - Renamed file_needs_update_time() inode_needs_update_time() > - inode_needs_update_time() no longer passes the file pointer > - Renamed file_modified_async() to file_modified_flags() > - Made file_modified_flags() an internal function > - Removed extern keyword in file_modified_async definition > - Added kiocb_modified function. > - Separate patch for changes to xfs_ilock_for_iomap() > - Separate patch for changes to xfs_ilock_inode() > - Renamed xfs_ilock_xfs_inode()n back to xfs_ilock_iocb() > - Renamed flags parameter to iocb_flags in function xfs_ilock_iocb() > - Used inode_set_flags() to manipulate inode flags in the function > file_modified_flags() > > V4: > - Reformat new code in generic_write_checks_count(). > - Removed patch that introduced new function iomap_page_create_gfp(). > - Add gfp parameter to iomap_page_create() and change all callers > All users will enforce the number of blocks check > - Removed confusing statement in iomap async buffer support patch > - Replace no_wait variable in __iomap_write_begin with check of > IOMAP_NOWAIT for easier readability. > - Moved else if clause in __iomap_write_begin into else clause for > easier readability > - Removed the balance_dirty_pages_ratelimited_async() function and > reverted back to the earlier version that used the function > balance_dirty_pages_ratelimited_flags() > - Introduced the flag BDP_ASYNC. > - Renamed variable in iomap_write_iter from i_mapping to mapping. > - Directly call balance_dirty_pages_ratelimited_flags() in the function > iomap_write_iter(). > - Re-ordered the patches. > > V3: > - Reformat new code in generic_write_checks_count() to line lengthof 80. > - Remove if condition in __iomap_write_begin to maintain current behavior. > - use GFP_NOWAIT flag in __iomap_write_begin > - rename need_file_remove_privs() function to file_needs_remove_privs() > - rename do_file_remove_privs to __file_remove_privs() > - add kernel documentation to file_remove_privs() function > - rework else if branch in file_remove_privs() function > - add kernel documentation to file_modified() function > - add kernel documentation to file_modified_async() function > - rename err variable in file_update_time to ret > - rename function need_file_update_time() to file_needs_update_time() > - rename function do_file_update_time() to __file_update_time() > - don't move check for FMODE_NOCMTIME in generic helper > - reformat __file_update_time for easier reading > - add kernel documentation to file_update_time() function > - fix if in file_update_time from < to <= > - move modification of inode flags from do_file_update_time to file_modified() > When this function is called, the caller must hold the inode lock. > - 3 new patches from Jan to add new no_wait flag to balance_dirty_pages(), > remove patch 12 from previous series > - Make balance_dirty_pages_ratelimited_flags() a static function > - Add new balance_dirty_pages_ratelimited_async() function > > V2: > - Remove atomic allocation > - Use direct write in xfs_buffered_write_iomap_begin() > - Use xfs_ilock_for_iomap() in xfs_buffered_write_iomap_begin() > - Remove no_wait check at the end of xfs_buffered_write_iomap_begin() for > the COW path. > - Pass xfs_inode pointer to xfs_ilock_iocb and rename function to > xfs_lock_xfs_inode > - Replace existing uses of xfs_ilock_iocb with xfs_ilock_xfs_inode > - Use xfs_ilock_xfs_inode in xfs_file_buffered_write() > - Callers of xfs_ilock_for_iomap need to initialize lock mode. This is > required so writes use an exclusive lock > - Split of _balance_dirty_pages() from balance_dirty_pages() and return > sleep time > - Call _balance_dirty_pages() in balance_dirty_pages_ratelimited_flags() > - Move call to balance_dirty_pages_ratelimited_flags() in iomap_write_iter() > to the beginning of the loop > > > > Jan Kara (3): > mm: Move starting of background writeback into the main balancing loop > mm: Move updates of dirty_exceeded into one place > mm: Add balance_dirty_pages_ratelimited_flags() function > > Stefan Roesch (11): > iomap: Add flags parameter to iomap_page_create() > iomap: Add async buffered write support > iomap: Return -EAGAIN from iomap_write_iter() > fs: Add check for async buffered writes to generic_write_checks > fs: add __remove_file_privs() with flags parameter > fs: Split off inode_needs_update_time and __file_update_time > fs: Add async write file modification handling. > io_uring: Add support for async buffered writes > io_uring: Add tracepoint for short writes > xfs: Specify lockmode when calling xfs_ilock_for_iomap() > xfs: Add async buffered write support > > fs/inode.c | 168 +++++++++++++++++++++++--------- > fs/io_uring.c | 32 +++++- > fs/iomap/buffered-io.c | 71 +++++++++++--- > fs/read_write.c | 4 +- > fs/xfs/xfs_file.c | 11 +-- > fs/xfs/xfs_iomap.c | 11 ++- > include/linux/fs.h | 4 + > include/linux/writeback.h | 7 ++ > include/trace/events/io_uring.h | 25 +++++ > mm/page-writeback.c | 89 +++++++++++------ > 10 files changed, 314 insertions(+), 108 deletions(-) > > > base-commit: b13baccc3850ca8b8cccbf8ed9912dbaa0fdf7f3 -- Jens Axboe