Now that we have bs > ps for block device sector sizes on linux-next the next eye sore is why our max sector size is stuck at 64k while we should be able to go up to in theory to the max supported by the page cache. On x86_64 that's 2 MiB. The reason we didn't jump to 2 MiB is because testing with a higher limit than 64k proved to have issues. While we've looked into them a glaring issue was scatter list limitation on the NVMe PCI driver. While we could adopt scatter list chaining, the work Christoph and Leon have been working on with the two step DMA API seems to be the way to go since the scatter lists are tied to PAGE_SIZE restrictions, and the scatter list chaining is just a mess. So it begs the question, with the new two step DMA API, does the problem get easier? The answer is yes, and for those that want to experiment this will let you do just that. With this we can enable 2 MiB LBA format on NVMe and we can issue single IOs up to 8 MiB for both buffered IO and direct IO. The last two patches are not really intended for upstream, but rather experimental code to let folks muck around with large sector sizes. Daniel Gomez has taken Leon Romanovsky's new two step DMA API [0] and Christoph Hellwig's "Block and NMMe PCI use of new DMA mapping API" [1]. We then used this to apply on top the 64k sector size patches now merged on linux-next and backported them to v6.14-rc5. The patches on this RFC are the patches on top of all that so to demonstrate the minimal changes needed to enable up to 8 MiB IOs on NVMe leveraging a 2 MiB max block sector size on x86_64 after the two-step DMA API and the NVMe cleanup. If you want a git tree to play with you can use our large-block-buffer-heads-2m linux branch from kdevops. [0] https://lore.kernel.org/all/20250302085717.GO53094@unreal/ [1] https://lore.kernel.org/all/cover.1730037261.git.leon@xxxxxxxxxx/ [2] https://github.com/linux-kdevops/linux/tree/large-block-buffer-heads-2m Luis Chamberlain (4): iomap: use BLK_MAX_BLOCK_SIZE for the iomap zero page blkdev: lift BLK_MAX_BLOCK_SIZE to page cache limit nvme-pci: bump segments to what the device can use nvme-pci: add quirk for qemu with bogus NOWS drivers/nvme/host/core.c | 2 + drivers/nvme/host/nvme.h | 5 ++ drivers/nvme/host/pci.c | 167 ++------------------------------------- fs/iomap/direct-io.c | 2 +- include/linux/blkdev.h | 7 +- 5 files changed, 15 insertions(+), 168 deletions(-) -- 2.47.2