On 2024/2/16 22:25, Wenjia Zhang wrote:
On 11.01.24 13:00, Wen Gu wrote:
This provides a way to {get|set} type of DMB offered by loopback-ism,
whether it is physically or virtually contiguous memory.
echo 0 > /sys/devices/virtual/smc/loopback-ism/dmb_type # physically
echo 1 > /sys/devices/virtual/smc/loopback-ism/dmb_type # virtually
The settings take effect after re-activating loopback-ism by:
echo 0 > /sys/devices/virtual/smc/loopback-ism/active
echo 1 > /sys/devices/virtual/smc/loopback-ism/active
After this, the link group and DMBs related to loopback-ism will be
flushed and subsequent DMBs created will be of the desired type.
The motivation of this control is that physically contiguous DMB has
best performance but is usually expensive, while the virtually
contiguous DMB is cheap and perform well in most scenarios, but if
sndbuf and DMB are merged, virtual DMB will be accessed concurrently
in Tx and Rx and there will be a bottleneck caused by lock contention
of find_vmap_area when there are many CPUs and CONFIG_HARDENED_USERCOPY
is set (see link below). So an option is provided.
I'm courious about why you say that physically contiguous DMB has best performance. Because we saw even a bit better
perfomance with the virtual one than the performance with the physical one.
Hi Wenjia, you can find examples from here:
https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@xxxxxxxxxxxxxxxxx/
https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@xxxxxxxxxxxxxxxxx/
Excerpted from above:
"
In 48 CPUs qemu environment, the Requests/s increased by 5 times:
- nginx
- wrk -c 1000 -t 96 -d 30 http://127.0.0.1:80
vzalloced shmem vzalloced shmem(with this patch set)
Requests/sec 113536.56 583729.93
But it also has some overhead, compared to using kzalloced shared memory
or unsetting CONFIG_HARDENED_USERCOPY, which won't involve finding vmap area:
kzalloced shmem vzalloced shmem(unset CONFIG_HARDENED_USERCOPY)
Requests/sec 831950.39 805164.78
"
Without CONFIG_HARDENED_USERCOPY, the performance of physical-DMB and
virtual-DMB is basically same (physical-DMB is a bit better), and with
CONFIG_HARDENED_USERCOPY, under many CPUs environment, such as 48 CPUs
here, if we merge sndbuf and DMB, the find_vmap_area lock contention is
heavy, and the performance is drop obviously. So I said physical-DMB has
best performance, since it can guarantee good performance under known
environments.
By the way, we discussed the memory cost before (see [1]), but I found
that when we use s390 ISM (or not merge sndbuf and DMB), the sndbuf also
costs physically contiguous memory.
static struct smc_buf_desc *smcd_new_buf_create(struct smc_link_group *lgr,
bool is_dmb, int bufsize)
{
<...>
if (is_dmb) {
<...>
} else {
buf_desc->cpu_addr = kzalloc(bufsize, GFP_KERNEL |
__GFP_NOWARN | __GFP_NORETRY |
__GFP_NOMEMALLOC);
if (!buf_desc->cpu_addr) {
kfree(buf_desc);
return ERR_PTR(-EAGAIN);
}
buf_desc->len = bufsize;
}
<...>
}
So I wonder is it really necessary to use virtual-DMB in loopback-ism? Maybe
we can always use physical-DMB in loopback-ism, then there is no need for the
dmb_type or dmb_copy knobs.
[1] https://lore.kernel.org/netdev/d6facfd5-e083-ffc7-05e5-2e8f3ef17735@xxxxxxxxxxxxxxxxx/
Thanks!