On Wed, 2012-02-01 at 08:59 -0700, Jim Schutt wrote: > The Ceph messenger would sometimes queue multiple work items to write > data to a socket when the socket buffer was full. > > Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the > same way that net/core/stream.c:sk_stream_write_space() does, i.e., > clearing it only when sufficient space is available in the socket buffer. > > Signed-off-by: Jim Schutt <jaschut@xxxxxxxxxx> This looks good to me. You are avoiding queueing more messages if the amount of free space for writing is less than the minimum. And if not, you leave the SOCK_NOSPACE flag alone. My only general thought on looking at this is that this is using instantaneous space available in a socket send buffer to make a decision on whether to add something to a workqueue, whose work item *may* put something into the send buffer. By the time the work item actually runs, the state of the socket may be very different. Still, this is an improvement and at this point I am not in a position to suggest anything that would close the gap. Reviewed-by: Alex Elder <elder@xxxxxxxxxxxxx> > --- > net/ceph/messenger.c | 18 ++++++++++++------ > 1 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c > index 67973f0..a4df90b 100644 > --- a/net/ceph/messenger.c > +++ b/net/ceph/messenger.c > @@ -175,16 +175,22 @@ static void ceph_write_space(struct sock *sk) > struct ceph_connection *con = > (struct ceph_connection *)sk->sk_user_data; > > - /* only queue to workqueue if there is data we want to write. */ > + /* only queue to workqueue if there is data we want to write, > + * and there is sufficient space in the socket buffer to accept > + * more data. clear SOCK_NOSPACE so that ceph_write_space() > + * doesn't get called again until try_write() fills the socket > + * buffer. See net/ipv4/tcp_input.c:tcp_check_space() > + * and net/core/stream.c:sk_stream_write_space(). > + */ > if (test_bit(WRITE_PENDING, &con->state)) { > - dout("ceph_write_space %p queueing write work\n", con); > - queue_con(con); > + if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)) { > + dout("ceph_write_space %p queueing write work\n", con); > + clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > + queue_con(con); > + } > } else { > dout("ceph_write_space %p nothing to write\n", con); > } > - > - /* since we have our own write_space, clear the SOCK_NOSPACE flag */ > - clear_bit(SOCK_NOSPACE, &sk->sk_socket->flags); > } > > /* socket's state has changed */ -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html