Re: [PATCH] RDS: sync congestion map updating

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 4/1/16 6:14 PM, Leon Romanovsky wrote:
On Fri, Apr 01, 2016 at 12:47:24PM -0700, santosh shilimkar wrote:
(cc-ing netdev)
On 3/30/2016 7:59 PM, Wengang Wang wrote:


在 2016年03月31日 09:51, Wengang Wang 写道:


在 2016年03月31日 01:16, santosh shilimkar 写道:
Hi Wengang,

On 3/30/2016 9:19 AM, Leon Romanovsky wrote:
On Wed, Mar 30, 2016 at 05:08:22PM +0800, Wengang Wang wrote:
Problem is found that some among a lot of parallel RDS
communications hang.
In my test ten or so among 33 communications hang. The send
requests got
-ENOBUF error meaning the peer socket (port) is congested. But
meanwhile,
peer socket (port) is not congested.

The congestion map updating can happen in two paths: one is in
rds_recvmsg path
and the other is when it receives packets from the hardware. There
is no
synchronization when updating the congestion map. So a bit
operation (clearing)
in the rds_recvmsg path can be skipped by another bit operation
(setting) in
hardware packet receving path.


To be more detailed.  Here, the two paths (user calls recvmsg and
hardware receives data) are for different rds socks. thus the
rds_sock->rs_recv_lock is not helpful to sync the updating on congestion
map.

For archive purpose, let me try to conclude the thread. I synced
with Wengang offlist and came up with below fix. I was under
impression that __set_bit_le() was atmoic version. After fixing
it like patch(end of the email), the bug gets addressed.

I will probably send this as fix for stable as well.


 From 5614b61f6fdcd6ae0c04e50b97efd13201762294 Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <santosh.shilimkar@xxxxxxxxxx>
Date: Wed, 30 Mar 2016 23:26:47 -0700
Subject: [PATCH] RDS: Fix the atomicity for congestion map update

Two different threads with different rds sockets may be in
rds_recv_rcvbuf_delta() via receive path. If their ports
both map to the same word in the congestion map, then
using non-atomic ops to update it could cause the map to
be incorrect. Lets use atomics to avoid such an issue.

Full credit to Wengang <wen.gang.wang@xxxxxxxxxx> for
finding the issue, analysing it and also pointing out
to offending code with spin lock based fix.

I'm glad that you solved the issue without spinlocks.
Out of curiosity, I see that this patch is needed to be sent
to Dave and applied by him. Is it right?

Right. I was planning send this one along with one more fix
together on netdev for Dave to pick it up.

➜  linus-tree git:(master) ./scripts/get_maintainer.pl -f net/rds/cong.c
Santosh Shilimkar <santosh.shilimkar@xxxxxxxxxx> (supporter:RDS -
RELIABLE DATAGRAM SOCKETS)
"David S. Miller" <davem@xxxxxxxxxxxxx> (maintainer:NETWORKING
[GENERAL])
netdev@xxxxxxxxxxxxxxx (open list:RDS - RELIABLE DATAGRAM SOCKETS)
linux-rdma@xxxxxxxxxxxxxxx (open list:RDS - RELIABLE DATAGRAM SOCKETS)
rds-devel@xxxxxxxxxxxxxx (moderated list:RDS - RELIABLE DATAGRAM
SOCKETS)
linux-kernel@xxxxxxxxxxxxxxx (open list)


Signed-off-by: Wengang Wang <wen.gang.wang@xxxxxxxxxx>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@xxxxxxxxxx>

Reviewed-by: Leon Romanovsky <leon@xxxxxxx>

Thanks for review.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux