Re: [LSF/MM/BPF TOPIC] breaking the 512 KiB IO boundary on x86_64

Daniel Gomez <da.gomez@xxxxxxxxxxx> · Fri, 21 Mar 2025 10:14:58 +0100

On Thu, Mar 20, 2025 at 03:54:49PM +0100, Christoph Hellwig wrote:
> On Thu, Mar 20, 2025 at 02:47:22PM +0100, Daniel Gomez wrote:
> > On Thu, Mar 20, 2025 at 04:41:11AM +0100, Luis Chamberlain wrote:
> > > We've been constrained to a max single 512 KiB IO for a while now on x86_64.
> > > This is due to the number of DMA segments and the segment size. With LBS the
> > > segments can be much bigger without using huge pages, and so on a 64 KiB
> > > block size filesystem you can now see 2 MiB IOs when using buffered IO.
> > 
> > Actually up to 8 MiB I/O with 64k filesystem block size with buffered I/O
> > as we can describe up to 128 segments at 64k size.
> 
> Block layer segments are in no way limited to the logical block size.

You are right but that was not what I meant. I'll use a 16 KiB fs
example as with 64 KiB you hit the current NVMe 8 MiB driver limit
(NVME_MAX_KB_SZ):

"on a 16 KiB block size filesystem, using buffered I/O will always allow
at least 2 MiB I/O, though higher I/O may be possible".

And yes, we can do 8 MiB I/O with direct I/O as well. It's just not
reliable unless huge pages are used. The maximum reliable supported I/O
size is 512 KiB.

With buffered I/O, a larger fs block size guarantees a specific upper
limit, i.e 2 MiB for 16 KiB, 4 MiB for 32 KiB and 8 MiB for 64 KiB.