在 2016年03月31日 09:51, Wengang Wang 写道:
在 2016年03月31日 01:16, santosh shilimkar 写道:
Hi Wengang,
On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
Problem is found that some among a lot of parallel RDS
communications hang.
In my test ten or so among 33 communications hang. The send
requests got
-ENOBUF error meaning the peer socket (port) is congested. But
meanwhile,
peer socket (port) is not congested.
The congestion map updating can happen in two paths: one is in
rds_recvmsg path
and the other is when it receives packets from the hardware. There
is no
synchronization when updating the congestion map. So a bit
operation (clearing)
in the rds_recvmsg path can be skipped by another bit operation
(setting) in
hardware packet receving path.
To be more detailed. Here, the two paths (user calls recvmsg and
hardware receives data) are for different rds socks. thus the
rds_sock->rs_recv_lock is not helpful to sync the updating on congestion
map.
thanks,
wengang
Fix is to add a spin lock per congestion map to sync the update on it.
No performance drop found during the test for the fix.
I assume that this change fixed your issue, however it looks suspicious
that performance wasn't change.
First of all thanks for finding the issue and posting patch
for it. I do agree with Leon on performance comment.
We shouldn't need locks for map updates.
Here is the performance data I collected yesterday.
Settings:
net.core.rmem_default = 4194304
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 2097152
test case: rds-stress -s 192.168.111.16 -q 1m -d 10 -T 300 -t 10
With 1M size sends, the 10 pending send request is enough to trigger
the congestion on receiver side. And the test last 5 mins.
result is like this:
without patch:
10 2231 2355 4697759.63 0.00 0.00 473.38 19123.89
-1.00 (average)
receiver
10 2356 2231 4698350.06 0.00 0.00 486.28 18537.23
-1.00 (average)
with patch applied:
sender
10 2230 2396 47x.53 0.00 0.00 475.87 31954.35 -1.00
(average)
receiver
10 2396 2230 4738051.76 0.00 0.00 480.85 18408.13
-1.00 (average)
So I don't see performance drops. On a previous test, the test result
is reverted that is it's faster when patch not applied, but the
numbers is till 47xxxxx VS 46xxxxx. So I don't have a very stable
test result. But in average, no obvious performance drop.
Let me try to explain from theory:
Firstly, No matter the rds_recvmsg path or the hardware receiving data
path, we have rds_sock->rs_recv_lock (this is not enough to fix our
issue here since there could be many different rds_socks) locked very
near before we lock the congestion map. So the performance drop on
CPU cache refilling is small.
Secondly, though the problem exist, the malformed map may be not
happening that frequent especially for this test case, 10 parallel
communication.
Moreover the parallel receive path on which this patch
is based of doesn't exist in upstream code. I have kept
that out so far because of similar issue like one you
encountered.
But I don't see how rds_recvmsg path is different from UEK kernels.
Can you explain more here or offline?
Anyways lets discuss offline about the fix even for the
downstream kernel. I suspect we can address it without locks.
If in normal use we have no performace issue (and before we found
import use case that would hit), I think locking is fine.
Well, what ideas do you have to prevent using locks? After all we are
updating a 8KB bitmap, not a single uint64 or less length variable. No
matter we use lock or not, we need to make sure the bits to update
can't be cached on different CPUs.
thanks,
wengang
Reagrds,
Santosh
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html