Re: [RFC PATCH v3 00/26] ext4: use iomap for regular file's buffered IO path and enable large foilo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Darrick J. Wong" <djwong@xxxxxxxxxx> writes:

> On Sat, Jan 27, 2024 at 09:57:59AM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>> 
>> Hello,
>> 
>> This is the third version of RFC patch series that convert ext4 regular
>> file's buffered IO path to iomap and enable large folio. It's rebased on
>> 6.7 and Christoph's "map multiple blocks per ->map_blocks in iomap
>> writeback" series [1]. I've fixed all issues found in the last about 3
>> weeks of stress tests and fault injection tests in v2. I hope I've
>> covered most of the corner cases, and any comments are welcome. :)
>> 
>> Changes since v2:
>>  - Update patch 1-6 to v3 [2].
>>  - iomap_zero and iomap_unshare don't need to update i_size and call
>>    iomap_write_failed(), introduce a new helper iomap_write_end_simple()
>>    to avoid doing that.
>>  - Factor out ext4_[ext|ind]_map_blocks() parts from ext4_map_blocks(),
>>    introduce a new helper ext4_iomap_map_one_extent() to allocate
>>    delalloc blocks in writeback, which is always under i_data_sem in
>>    write mode. This is done to prevent the writing back delalloc
>>    extents become stale if it raced by truncate.
>>  - Add a lock detection in mapping_clear_large_folios().
>> Changes since v1:
>>  - Introduce seq count for iomap buffered write and writeback to protect
>>    races from extents changes, e.g. truncate, mwrite.
>>  - Always allocate unwritten extents for new blocks, drop dioread_lock
>>    mode, and make no distinctions between dioread_lock and
>>    dioread_nolock.
>>  - Don't add ditry data range to jinode, drop data=ordered mode, and
>>    make no distinctions between data=ordered and data=writeback mode.
>>  - Postpone updating i_disksize to endio.
>>  - Allow splitting extents and use reserved space in endio.
>>  - Instead of reimplement a new delayed mapping helper
>>    ext4_iomap_da_map_blocks() for buffer write, try to reuse
>>    ext4_da_map_blocks().
>>  - Add support for disabling large folio on active inodes.
>>  - Support online defragmentation, make file fall back to buffer_head
>>    and disable large folio in ext4_move_extents().
>>  - Move ext4_nonda_switch() in advance to prevent deadlock in mwrite.
>>  - Add dirty_len and pos trace info to trace_iomap_writepage_map().
>>  - Update patch 1-6 to v2.
>> 
>> This series only support ext4 with the default features and mount
>> options, doesn't support inline_data, bigalloc, dax, fs_verity, fs_crypt
>> and data=journal mode, ext4 would fall back to buffer_head path
>
> Do you plan to add bigalloc or !extents support as a part 2 patchset?
>

Hi Darrick,

> An ext2 port to iomap has been (vaguely) in the works for a while,

yes, we have [1][2]. I am in the process of rebasing that work on the latest
upstream. It's been a while since my last post since I have been pulled
into some other internal work, sorry about that.

> though iirc willy never got the performance to match because iomap

Ohh, can you help me provide details on what performance benchmark was
run? I can try and run them when I rebase.

> didn't have a mechanism for the caller to tell it "run the IO now even
> though you don't have a complete page, because the indirect block is the
> next block after the 11th block".

Do you mean this for a large folio? I still didn't get the problem you
are referring here. Can you please help me explain why could that be a
problem?

[1]: https://lore.kernel.org/linux-ext4/9cdd449fc1d63cf2dba17cfa2fa7fb29b8f96a46.1700506526.git.ritesh.list@xxxxxxxxx/
[2]: https://lore.kernel.org/linux-ext4/8734wnj53k.fsf@xxxxxxx/

-ritesh




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux