Re: [LSF/MM/BPF TOPIC] breaking the 512 KiB IO boundary on x86_64

Keith Busch <kbusch@xxxxxxxxxx> · Fri, 21 Mar 2025 12:55:50 -0600

On Fri, Mar 21, 2025 at 10:51:42PM +0530, Ritesh Harjani wrote:
> Keith Busch <kbusch@xxxxxxxxxx> writes:
> 
> > On Fri, Mar 21, 2025 at 07:43:09AM +0530, Ritesh Harjani wrote:
> >> i.e. w/o large folios in block devices one could do direct-io &
> >> buffered-io in parallel even just next to each other (assuming 4k pagesize). 
> >> 
> >>            |4k-direct-io | 4k-buffered-io | 
> >> 
> >> 
> >> However with large folios now supported in buffered-io path for block
> >> devices, the application cannot submit such direct-io + buffered-io
> >> pattern in parallel. Since direct-io can end up invalidating the folio
> >> spanning over it's 4k range, on which buffered-io is in progress.
> >
> > Why would buffered io span more than the 4k range here? You're talking
> > to the raw block device in both cases, so they have the exact same
> > logical block size alignment. Why is buffered io allocating beyond
> > the logical size granularity?
> 
> This can happen in following 2 cases - 
> 1. System's page size is 64k. Then even though the logical block size
> granularity for buffered-io is set to 4k (blockdev --setbsz 4k
> /dev/sdc), it still will instantiate a 64k page in the page cache.

But that already happens without large folio support, so I wasn't
considering that here.

> 2. Second is the recent case where (correct me if I am wrong) we now
> have large folio support for block devices. So here again we can
> instantiate a large folio in the page cache where buffered-io is in
> progress correct? (say a previous read causes a readahead and installs a
> large folio in that region). Or even iomap_write_iter() these days tries
> to first allocate a chunk of size mapping_max_folio_size().

Okay, I am also not sure on what happens for this part on speculative
allocations.