On 5/1/2018 8:04 PM, Doug Ledford wrote:
On Mon, 2018-04-30 at 13:31 +0300, Max Gurtovoy wrote:
Jason/Doug/Leon,
can you please apply this patch ?
Sergey made the needed changes.
-Max.
On 4/16/2018 6:00 PM, Sergey Gorenko wrote:
Some users complain about RNR errors on the target, when heavy
high-priority tasks run on the initiator. After the
investigation, we found out that the receive WRs were exhausted,
because the initiator could not post them on time.
Receive work reqeusts are posted in chunks of min_posted_rx to
reduce the number of hits to the HCA. The WRs are posted in the
receive completion handler when the number of free receive buffers
reaches min_posted_rx. But on a high-loaded host, receive CQEs
processing can be delayed and all receive WRs will be exhausted.
In this case, the target will get an RNR error.
To avoid this, we post receive WR, as soon as at least one receive
buffer is freed. This increases the number of hits to the HCA, but
test results show that performance degradation is not significant.
Performance results running fio (8 jobs, 64 iodepth) using ramdisk
(w/w.o patch):
bs IOPS(randread) IOPS(randwrite)
------ --------------- ---------------
512 329.4K / 340.3K 379.1K / 387.7K
3% performance drop
1K 333.4K / 337.6K 364.2K / 370.0K
1.4% performance drop
2K 321.1K / 323.5K 334.2K / 337.8K
.75% performance drop
I know you said the performance hit was not significant, and by the time
you get to 2k reads/writes, I agree with you, but at the low end, I'm
not sure you can call 3% "not significant".
Is a 3% performance hit better than the transient rnr error? And is
this the only solution available?
This problem here is the fact that iSER is using 1 QP per session and
then you see 3% hit. I guess if we run a test using 64 targets and 64
sessions (64 QPs) then the locality of the completions would have smooth
this hit away. I agree that 3% hit per this scenario is not optimal. I
also hope that the work that IdanB and myself did with mlx5 inline
KLM/MTT will balance this one as well (Currently it should be pushed in
Leon/Jason PR. We saw > x3.5 improvement for small IOs in NVME-oF) -
Sergey, we can check this out in our lab.
For conclusion, IMO we have few improvments in other patches/drivers
that can balance this 3% hit for small IOs and we can also make
optimizations for performance in the future (e.g. adaptive CQ
moderation, likely/unlikely prefixes, etc..).
This is exactly how NVME-oF process the recv completions and we don't
complain there :).
Also, we recently fixed the post send in NVME-oF to signal on each
completion and this caused a hit in the performance either but we
overcome it using other improvments (BTW, this signaling fix is on our
plate for iSER as well).
4K 300.7K / 302.9K 290.2K / 291.6K
8K 235.9K / 237.1K 228.2K / 228.8K
16K 176.3K / 177.0K 126.3K / 125.9K
32K 115.2K / 115.4K 82.2K / 82.0K
64K 70.7K / 70.7K 47.8K / 47.6K
128K 38.5K / 38.6K 25.8K / 25.7K
256K 20.7K / 20.7K 13.7K / 13.6K
512K 10.0K / 10.0K 7.0K / 7.0K
Signed-off-by: Sergey Gorenko <sergeygo@xxxxxxxxxxxx>
Signed-off-by: Vladimir Neyelov <vladimirn@xxxxxxxxxxxx>
Reviewed-by: Max Gurtovoy <maxg@xxxxxxxxxxxx>
---
-Max.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html