Re: [PATCH v3] IB/iser: Fix RNR errors

Max Gurtovoy <maxg@xxxxxxxxxxxx> · Wed, 2 May 2018 13:27:01 +0300

On 5/1/2018 8:04 PM, Doug Ledford wrote:
On Mon, 2018-04-30 at 13:31 +0300, Max Gurtovoy wrote:
Jason/Doug/Leon,
can you please apply this patch ?
Sergey made the needed changes.

-Max.

On 4/16/2018 6:00 PM, Sergey Gorenko wrote:
Some users complain about RNR errors on the target, when heavy
high-priority tasks run on the initiator. After the
investigation, we found out that the receive WRs were exhausted,
because the initiator could not post them on time.

Receive work reqeusts are posted in chunks of min_posted_rx to
reduce the number of hits to the HCA. The WRs are posted in the
receive completion handler when the number of free receive buffers
reaches min_posted_rx. But on a high-loaded host, receive CQEs
processing can be delayed and all receive WRs will be exhausted.
In this case, the target will get an RNR error.

To avoid this, we post receive WR, as soon as at least one receive
buffer is freed. This increases the number of hits to the HCA, but
test results show that performance degradation is not significant.

Performance results running fio (8 jobs, 64 iodepth) using ramdisk
(w/w.o patch):

bs      IOPS(randread)    IOPS(randwrite)
------  ---------------   ---------------
512     329.4K / 340.3K   379.1K / 387.7K

3% performance drop

1K      333.4K / 337.6K   364.2K / 370.0K

1.4% performance drop

2K      321.1K / 323.5K   334.2K / 337.8K

.75% performance drop

I know you said the performance hit was not significant, and by the time
you get to 2k reads/writes, I agree with you, but at the low end, I'm
not sure you can call 3% "not significant".

Is a 3% performance hit better than the transient rnr error?  And is
this the only solution available?

This problem here is the fact that iSER is using 1 QP per session and 
then you see 3% hit. I guess if we run a test using 64 targets and 64 
sessions (64 QPs) then the locality of the completions would have smooth 
this hit away. I agree that 3% hit per this scenario is not optimal. I 
also hope that the work that IdanB and myself did with mlx5 inline 
KLM/MTT will balance this one as well (Currently it should be pushed in 
Leon/Jason PR. We saw > x3.5 improvement for small IOs in NVME-oF) - 
Sergey, we can check this out in our lab.

For conclusion, IMO we have few improvments in other patches/drivers 
that can balance this 3% hit for small IOs and we can also make 
optimizations for performance in the future (e.g. adaptive CQ 
moderation, likely/unlikely prefixes, etc..).
This is exactly how NVME-oF process the recv completions and we don't 
complain there :).
Also, we recently fixed the post send in NVME-oF to signal on each 
completion and this caused a hit in the performance either but we 
overcome it using other improvments (BTW, this signaling fix is on our 
plate for iSER as well).

4K      300.7K / 302.9K   290.2K / 291.6K
8K      235.9K / 237.1K   228.2K / 228.8K
16K     176.3K / 177.0K   126.3K / 125.9K
32K     115.2K / 115.4K    82.2K / 82.0K
64K      70.7K / 70.7K     47.8K / 47.6K
128K     38.5K / 38.6K     25.8K / 25.7K
256K     20.7K / 20.7K     13.7K / 13.6K
512K     10.0K / 10.0K      7.0K / 7.0K

Signed-off-by: Sergey Gorenko <sergeygo@xxxxxxxxxxxx>
Signed-off-by: Vladimir Neyelov <vladimirn@xxxxxxxxxxxx>
Reviewed-by: Max Gurtovoy <maxg@xxxxxxxxxxxx>
---

-Max.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html