On 20.02.24 04:36, Wen Gu wrote:
On 2024/2/16 22:25, Wenjia Zhang wrote:
On 11.01.24 13:00, Wen Gu wrote:
This provides a way to {get|set} whether loopback-ism device supports
merging sndbuf with peer DMB to eliminate data copies between them.
echo 0 > /sys/devices/virtual/smc/loopback-ism/dmb_copy # support
echo 1 > /sys/devices/virtual/smc/loopback-ism/dmb_copy # not support
Besides the same confusing as Niklas already mentioned, the name of
the option looks not clear enough to what it means. What about:
echo 1 > /sys/devices/virtual/smc/loopback-ism/nocopy_support # merge
mode
echo 0 > /sys/devices/virtual/smc/loopback-ism/nocopy_support # copy mode
OK, if we decide to keep the knobs, I will improve the name. Thanks!
The settings take effect after re-activating loopback-ism by:
echo 0 > /sys/devices/virtual/smc/loopback-ism/active
echo 1 > /sys/devices/virtual/smc/loopback-ism/active
After this, the link group related to loopback-ism will be flushed and
the sndbufs of subsequent connections will be merged or not merged with
peer DMB.
The motivation of this control is that the bandwidth will be highly
improved when sndbuf and DMB are merged, but when virtually contiguous
DMB is provided and merged with sndbuf, it will be concurrently accessed
on Tx and Rx, then there will be a bottleneck caused by lock contention
of find_vmap_area when there are many CPUs and CONFIG_HARDENED_USERCOPY
is set (see link below). So an option is provided.
Link:
https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@xxxxxxxxxxxxxxxxx/
Signed-off-by: Wen Gu <guwen@xxxxxxxxxxxxxxxxx>
---
We tried some simple workloads, and the performance of the no-copy
case was remarkable. Thus, we're wondering if it is necessary to have
the tunable setting in this loopback case? Or rather, why do we need
the copy option? Is that because of the bottleneck caused by using the
combination of the no-copy and virtually contiguours DMA? Or at least
let no-copy as the default one.
Yes, it is because the bottleneck caused by using the combination of the
no-copy
and virtual-DMB. If we have to use virtual-DMB and
CONFIG_HARDENED_USERCOPY is
set, then we may be forced to use copy mode in many CPUs environment, to
get the
good latency performance (the bandwidth performance still drop because
of copy mode).
But if we agree that physical-DMB is acceptable (it costs 1 physical
buffer per conn side
in loopback-ism no-copy mode, same as what sndbuf costs when using s390
ISM), then
there is no such performance issue and the two knobs can be removed.
(see also the reply
for 13/15 patch [1]).
[1]
https://lore.kernel.org/netdev/442061eb-107a-421d-bc2e-13c8defb0f7b@xxxxxxxxxxxxxxxxx/
Thanks!
Thank you, Wen, for the elaboration! As I said, though we did see some
better performance on using the virtually contiguous memory with a
simple test, the improvement was not really significant. Additionally,
our environment ist very different as your 48 CPUs qemu environment, and
it also depends on the workload. I think I can understand why you see
better performance by using physically contiguous memory. Anyway, I
don't have any objection on using physical-DMB only. But I still want to
see if there is any other opinion.