Re: [LSF/MM/BPF TOPIC] breaking the 512 KiB IO boundary on x86_64

Ritesh Harjani (IBM) <ritesh.list@xxxxxxxxx> · Fri, 21 Mar 2025 22:51:42 +0530

Keith Busch <kbusch@xxxxxxxxxx> writes:

> On Fri, Mar 21, 2025 at 07:43:09AM +0530, Ritesh Harjani wrote:
>> i.e. w/o large folios in block devices one could do direct-io &
>> buffered-io in parallel even just next to each other (assuming 4k pagesize). 
>> 
>>            |4k-direct-io | 4k-buffered-io | 
>> 
>> 
>> However with large folios now supported in buffered-io path for block
>> devices, the application cannot submit such direct-io + buffered-io
>> pattern in parallel. Since direct-io can end up invalidating the folio
>> spanning over it's 4k range, on which buffered-io is in progress.
>
> Why would buffered io span more than the 4k range here? You're talking
> to the raw block device in both cases, so they have the exact same
> logical block size alignment. Why is buffered io allocating beyond
> the logical size granularity?

This can happen in following 2 cases - 
1. System's page size is 64k. Then even though the logical block size
granularity for buffered-io is set to 4k (blockdev --setbsz 4k
/dev/sdc), it still will instantiate a 64k page in the page cache.

2. Second is the recent case where (correct me if I am wrong) we now
have large folio support for block devices. So here again we can
instantiate a large folio in the page cache where buffered-io is in
progress correct? (say a previous read causes a readahead and installs a
large folio in that region). Or even iomap_write_iter() these days tries
to first allocate a chunk of size mapping_max_folio_size().

However with large folio support now in block devices, I am not sure
whether an application can retain much benefit of doing buffered-io (if
they happen to mix buffered-io and direct-io carefully over a logical
boundary). Because the direct-io can end up invalidating the entire
large folio, if there is one, in the region where the direct-io
operation is taking place. However this may still be useful if only
buffered-io is being performed on the block device.

-ritesh