Keith Busch <kbusch@xxxxxxxxxx> writes: > On Fri, Mar 21, 2025 at 07:43:09AM +0530, Ritesh Harjani wrote: >> i.e. w/o large folios in block devices one could do direct-io & >> buffered-io in parallel even just next to each other (assuming 4k pagesize). >> >> |4k-direct-io | 4k-buffered-io | >> >> >> However with large folios now supported in buffered-io path for block >> devices, the application cannot submit such direct-io + buffered-io >> pattern in parallel. Since direct-io can end up invalidating the folio >> spanning over it's 4k range, on which buffered-io is in progress. > > Why would buffered io span more than the 4k range here? You're talking > to the raw block device in both cases, so they have the exact same > logical block size alignment. Why is buffered io allocating beyond > the logical size granularity? This can happen in following 2 cases - 1. System's page size is 64k. Then even though the logical block size granularity for buffered-io is set to 4k (blockdev --setbsz 4k /dev/sdc), it still will instantiate a 64k page in the page cache. 2. Second is the recent case where (correct me if I am wrong) we now have large folio support for block devices. So here again we can instantiate a large folio in the page cache where buffered-io is in progress correct? (say a previous read causes a readahead and installs a large folio in that region). Or even iomap_write_iter() these days tries to first allocate a chunk of size mapping_max_folio_size(). However with large folio support now in block devices, I am not sure whether an application can retain much benefit of doing buffered-io (if they happen to mix buffered-io and direct-io carefully over a logical boundary). Because the direct-io can end up invalidating the entire large folio, if there is one, in the region where the direct-io operation is taking place. However this may still be useful if only buffered-io is being performed on the block device. -ritesh