[LSF/MM BPF TOPIC] NUMA topology metrics for NVMe-oF

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

having recently played around with CXL I started to wonder which impllication that would have for NVMe-over-Fabrics, and how the path selection would play out on such a system.

Thing is, with heavy NUMA systems we really should have a look at
the inter-node latencies, especially as the HW latencies are getting
closer to the NUMA latencies: for an Intel two socket node I'm seeing
latencies of around 200ns, and it's not unheard of getting around 5M IOPS from the device, which results in a latency of 2000ns. And that's on PCI4.0. With PCI5 or CXL one expects the latency to decrease even further.

So I think that we should need to look at factor in the NUMA topology
for PCI devices, too. We do have a NUMA I/O policy, but that only looks
at the latency between nodes.
What we're missing is a NUMA latency for the PCI devices themselves.

So this discussion would be around how we could model (or even measure)
the PCI latency, and how we could modify the NVMe-oF iopolicies to take the NUMA latencies into account when selecting the 'best' path.

Cheers,

Hannes
--
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@xxxxxxx                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux