On 3/20/2025 9:28 PM, Keith Busch wrote: > On Thu, Mar 20, 2025 at 08:37:05AM -0700, Bart Van Assche wrote: >> On 3/20/25 7:18 AM, Christoph Hellwig wrote: >>> On Thu, Mar 20, 2025 at 04:41:11AM -0700, Luis Chamberlain wrote: >>>> We've been constrained to a max single 512 KiB IO for a while now on x86_64. >>> No, we absolutely haven't. I'm regularly seeing multi-MB I/O on both >>> SCSI and NVMe setup. >> Is NVME_MAX_KB_SZ the current maximum I/O size for PCIe NVMe >> controllers? From drivers/nvme/host/pci.c: > Yes, this is the driver's limit. The device's limit may be lower or > higher. > > I allocate out of hugetlbfs to reliably send direct IO at this size > because the nvme driver's segment count is limited to 128. The driver > doesn't impose a segment size limit, though. If each segment is only 4k > (a common occurance), I guess that's where Luis is getting the 512K > limit? Even if we hit that segment count limit (128), the I/O can go fine as block layer will split that, while application still thinks it's single I/O. But if we don't want this internal split (for LBS) or using passthrough path, we will see failure.