"Darrick J. Wong" <djwong@xxxxxxxxxx> writes: > On Sat, Jan 27, 2024 at 09:57:59AM +0800, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@xxxxxxxxxx> >> >> Hello, >> >> This is the third version of RFC patch series that convert ext4 regular >> file's buffered IO path to iomap and enable large folio. It's rebased on >> 6.7 and Christoph's "map multiple blocks per ->map_blocks in iomap >> writeback" series [1]. I've fixed all issues found in the last about 3 >> weeks of stress tests and fault injection tests in v2. I hope I've >> covered most of the corner cases, and any comments are welcome. :) >> >> Changes since v2: >> - Update patch 1-6 to v3 [2]. >> - iomap_zero and iomap_unshare don't need to update i_size and call >> iomap_write_failed(), introduce a new helper iomap_write_end_simple() >> to avoid doing that. >> - Factor out ext4_[ext|ind]_map_blocks() parts from ext4_map_blocks(), >> introduce a new helper ext4_iomap_map_one_extent() to allocate >> delalloc blocks in writeback, which is always under i_data_sem in >> write mode. This is done to prevent the writing back delalloc >> extents become stale if it raced by truncate. >> - Add a lock detection in mapping_clear_large_folios(). >> Changes since v1: >> - Introduce seq count for iomap buffered write and writeback to protect >> races from extents changes, e.g. truncate, mwrite. >> - Always allocate unwritten extents for new blocks, drop dioread_lock >> mode, and make no distinctions between dioread_lock and >> dioread_nolock. >> - Don't add ditry data range to jinode, drop data=ordered mode, and >> make no distinctions between data=ordered and data=writeback mode. >> - Postpone updating i_disksize to endio. >> - Allow splitting extents and use reserved space in endio. >> - Instead of reimplement a new delayed mapping helper >> ext4_iomap_da_map_blocks() for buffer write, try to reuse >> ext4_da_map_blocks(). >> - Add support for disabling large folio on active inodes. >> - Support online defragmentation, make file fall back to buffer_head >> and disable large folio in ext4_move_extents(). >> - Move ext4_nonda_switch() in advance to prevent deadlock in mwrite. >> - Add dirty_len and pos trace info to trace_iomap_writepage_map(). >> - Update patch 1-6 to v2. >> >> This series only support ext4 with the default features and mount >> options, doesn't support inline_data, bigalloc, dax, fs_verity, fs_crypt >> and data=journal mode, ext4 would fall back to buffer_head path > > Do you plan to add bigalloc or !extents support as a part 2 patchset? > Hi Darrick, > An ext2 port to iomap has been (vaguely) in the works for a while, yes, we have [1][2]. I am in the process of rebasing that work on the latest upstream. It's been a while since my last post since I have been pulled into some other internal work, sorry about that. > though iirc willy never got the performance to match because iomap Ohh, can you help me provide details on what performance benchmark was run? I can try and run them when I rebase. > didn't have a mechanism for the caller to tell it "run the IO now even > though you don't have a complete page, because the indirect block is the > next block after the 11th block". Do you mean this for a large folio? I still didn't get the problem you are referring here. Can you please help me explain why could that be a problem? [1]: https://lore.kernel.org/linux-ext4/9cdd449fc1d63cf2dba17cfa2fa7fb29b8f96a46.1700506526.git.ritesh.list@xxxxxxxxx/ [2]: https://lore.kernel.org/linux-ext4/8734wnj53k.fsf@xxxxxxx/ -ritesh