On 01/03/18 04:20 PM, Jason Gunthorpe wrote:
On Thu, Mar 01, 2018 at 11:00:51PM +0000, Stephen Bates wrote:
No, locality matters. If you have a bunch of NICs and bunch of drives and the allocator chooses to put all P2P memory on a single drive your performance will suck horribly even if all the traffic is offloaded. Performance will suck if you have speed differences between the PCI-E devices. Eg a bunch of Gen 3 x8 NVMe cards paired with a Gen 4 x16 NIC will not reach full performance unless the p2p buffers are properly balanced between all cards.
This would be solved better by choosing the closest devices in the hierarchy in the p2pmem_find function (ie. preferring devices involved in the transaction). Also, based on the current code, it's a bit of a moot point seeing it requires all devices to be on the same switch. So unless you have a giant switch hidden somewhere you're not going to get too many NIC/NVME pairs.
In any case, we are looking at changing both those things so distance in the topology is preferred and the ability to span multiple switches is allowed. We're also discussing changing it to "user picks" which simplifies all of this.
Logan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html