On 2/7/22 14:49, Leon Romanovsky wrote:
On Mon, Feb 07, 2022 at 05:59:58PM +0800, Tony Lu wrote:
On Mon, Jan 31, 2022 at 09:20:52AM +0200, Leon Romanovsky wrote:
On Mon, Jan 31, 2022 at 03:03:00AM +0800, Tony Lu wrote:
Currently, pages are allocated in the process context, for its NUMA node
isn't equal to ibdev's, which is not the best policy for performance.
Applications will generally perform best when the processes are
accessing memory on the same NUMA node. When numa_balancing enabled
(which is enabled by most of OS distributions), it moves tasks closer to
the memory of sndbuf or rmb and ibdev, meanwhile, the IRQs of ibdev bind
to the same node usually. This reduces the latency when accessing remote
memory.
It is very subjective per-specific test. I would expect that
application will control NUMA memory policies (set_mempolicy(), ...)
by itself without kernel setting NUMA node.
Various *_alloc_node() APIs are applicable for in-kernel allocations
where user can't control memory policy.
I don't know SMC-R enough, but if I judge from your description, this
allocation is controlled by the application.
The original design of SMC doesn't handle the memory allocation of
different NUMA node, and the application can't control the NUMA policy
in SMC.
It allocates memory according to the NUMA node based on the process
context, which is determined by the scheduler. If application process
runs on NUMA node 0, SMC allocates on node 0 and so on, it all depends
on the scheduler. If RDMA device is attached to node 1, the process runs
on node 0, it allocates memory on node 0.
This patch tries to allocate memory on the same NUMA node of RDMA
device. Applications can't know the current node of RDMA device. The
scheduler knows the node of memory, and can let applications run on the
same node of memory and RDMA device.
I don't know, everything explained above is controlled through memory
policy, where application needs to run on same node as ibdev.
The purpose of SMC-R is to provide a drop-in replacement for existing TCP/IP
applications. The idea is to avoid almost any modification to the application,
just switch the address family. So while what you say makes a lot of sense for
applications that intend to use RDMA, in the case of SMC-R we can safely assume
that most if not all applications running it assume they get connectivity
through a non-RDMA NIC. Hence we cannot expect the applications to think about
aspects such as NUMA, and we should do the right thing within SMC-R.
Ciao,
Stefan