On Mon, Jan 31, 2022 at 09:20:52AM +0200, Leon Romanovsky wrote: > On Mon, Jan 31, 2022 at 03:03:00AM +0800, Tony Lu wrote: > > Currently, pages are allocated in the process context, for its NUMA node > > isn't equal to ibdev's, which is not the best policy for performance. > > > > Applications will generally perform best when the processes are > > accessing memory on the same NUMA node. When numa_balancing enabled > > (which is enabled by most of OS distributions), it moves tasks closer to > > the memory of sndbuf or rmb and ibdev, meanwhile, the IRQs of ibdev bind > > to the same node usually. This reduces the latency when accessing remote > > memory. > > It is very subjective per-specific test. I would expect that > application will control NUMA memory policies (set_mempolicy(), ...) > by itself without kernel setting NUMA node. > > Various *_alloc_node() APIs are applicable for in-kernel allocations > where user can't control memory policy. > > I don't know SMC-R enough, but if I judge from your description, this > allocation is controlled by the application. The original design of SMC doesn't handle the memory allocation of different NUMA node, and the application can't control the NUMA policy in SMC. It allocates memory according to the NUMA node based on the process context, which is determined by the scheduler. If application process runs on NUMA node 0, SMC allocates on node 0 and so on, it all depends on the scheduler. If RDMA device is attached to node 1, the process runs on node 0, it allocates memory on node 0. This patch tries to allocate memory on the same NUMA node of RDMA device. Applications can't know the current node of RDMA device. The scheduler knows the node of memory, and can let applications run on the same node of memory and RDMA device. Thanks, Tony Lu