Maybe I'm missing something but how do you make sure that all your allocations buffers for DMA (of the NVMe + HCA) are done on the same socket ? From the code I understood that you make sure that the cq is assigned to appropriate completion vector according to the port CPUs (given by the user) and all the interrupts will be routed to the relevant socket (no QPI cross here since the MSI MMIO address is mapped to "local" node), but IMO more work is needed to make sure that _all_ the allocated buffers/pages are done from the memory assigned to that CPU node (or is it something that is done already ?)
The allocator takes care of that for us (if it can...). By assigning the completion vector we will run the IO thread from the corresponding cpu, then in turn, page allocation will attempt to grab one which is local to the running numa node and only if not available will fall back to the far numa node... However, in-capsule page allocation is not numa aware. We could also modify to alloc_pages_node in case we have a clear numa node indication, but we can defer that to a later stage... -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html