[RFC 0/4] nvme-pci: breaking the 512 KiB max IO boundary

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Now that we have bs > ps for block device sector sizes on linux-next the next
eye sore is why our max sector size is stuck at 64k while we should be able to
go up to in theory to the max supported by the page cache. On x86_64 that's 2
MiB.

The reason we didn't jump to 2 MiB is because testing with a higher limit than
64k proved to have issues. While we've looked into them a glaring issue was
scatter list limitation on the NVMe PCI driver. While we could adopt scatter
list chaining, the work Christoph and Leon have been working on with the two
step DMA API seems to be the way to go since the scatter lists are tied to
PAGE_SIZE restrictions, and the scatter list chaining is just a mess.

So it begs the question, with the new two step DMA API, does the problem
get easier? The answer is yes, and for those that want to experiment this
will let you do just that.

With this we can enable 2 MiB LBA format on NVMe and we can issue single IOs
up to 8 MiB for both buffered IO and direct IO. The last two patches are not
really intended for upstream, but rather experimental code to let folks muck
around with large sector sizes.

Daniel Gomez has taken Leon Romanovsky's new two step DMA API [0] and
Christoph Hellwig's "Block and NMMe PCI use of new DMA mapping API" [1].
We then used this to apply on top the 64k sector size patches now merged on
linux-next and backported them to v6.14-rc5. The patches on this RFC
are the patches on top of all that so to demonstrate the minimal changes
needed to enable up to 8 MiB IOs on NVMe leveraging a 2 MiB max block
sector size on x86_64 after the two-step DMA API and the NVMe cleanup.

If you want a git tree to play with you can use our large-block-buffer-heads-2m
linux branch from kdevops.

[0] https://lore.kernel.org/all/20250302085717.GO53094@unreal/ 
[1] https://lore.kernel.org/all/cover.1730037261.git.leon@xxxxxxxxxx/
[2] https://github.com/linux-kdevops/linux/tree/large-block-buffer-heads-2m

Luis Chamberlain (4):
  iomap: use BLK_MAX_BLOCK_SIZE for the iomap zero page
  blkdev: lift BLK_MAX_BLOCK_SIZE to page cache limit
  nvme-pci: bump segments to what the device can use
  nvme-pci: add quirk for qemu with bogus NOWS

 drivers/nvme/host/core.c |   2 +
 drivers/nvme/host/nvme.h |   5 ++
 drivers/nvme/host/pci.c  | 167 ++-------------------------------------
 fs/iomap/direct-io.c     |   2 +-
 include/linux/blkdev.h   |   7 +-
 5 files changed, 15 insertions(+), 168 deletions(-)

-- 
2.47.2





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux