This disables softirqs when performing the CCID-specific send operation in dccp_write_xmit, so that actual sending, and calling the CCID post-send routine becomes an atomic unit. Why this needs to be done: The function dccp_write_xmit can be called both in user context (via dccp_sendmsg) and via timer softirq (dccp_write_xmit_timer). It does 1. call the CCID-specific `pre-send' routine ccid_hc_tx_send_packet() 2. ship the skb via dccp_transmit_skb 3. call the CCID-specific `post-send' routine ccid_hc_tx_packet_sent(). The last one does e.g. accounting by updating data records (as in CCID 3). The transition from 2 ... 3 should be atomic and not be interrupted by softirqs. The reason is that the TX and RX halves of the CCID modules share data structures and both halves change state. If the sending process is allowed to be interrupted by the reception of a DCCP packet via softirq handler, then state and data structures of the sender can become corrupted. Here is an actual example whose effects were observed and lead to this patch: in CCID 3 the sender records a timestamp when ccid_hc_tx_packet_sent() is called. If the application is sending via dccp_sendmsg, it may be interrupted and run a little while later. Suppose that such interruption happens between steps (2) and (3) above: the packet has been sent, and immediately afterwards dccp_sendmsg is interrupted. Meanwhile the transmitted skb reaches the other side, and an Ack comes back; this Ack is processed via softirq (which is allowed to interrupt dccp_sendmsg); only then step (3) is performed, but too late: the timestamp taken in ccid3_hc_tx_packet_sent is now /after/ the Ack has come in. In the observed case, negative RTT samples (i.e. Acks arriving before the sent packet was registered) were the result. Signed-off-by: Gerrit Renker <gerrit@xxxxxxxxxxxxxx> Acked-by: Ian McDonald <ian.mcdonald@xxxxxxxxxxx> Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxxxxxxxxxx> --- net/dccp/output.c | 13 ++++++++++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff --git a/net/dccp/output.c b/net/dccp/output.c index c8d843e..0978bc2 100644 --- a/net/dccp/output.c +++ b/net/dccp/output.c @@ -250,11 +250,18 @@ void dccp_write_xmit(struct sock *sk, int block) else dcb->dccpd_type = DCCP_PKT_DATA; + /* + * Transmission and calling the post-send CCID operation + * must not be interrupted by other processing (e.g. + * packet reception), otherwise strange errors result. + */ + local_bh_disable(); err = dccp_transmit_skb(sk, skb); ccid_hc_tx_packet_sent(dp->dccps_hc_tx_ccid, sk, 0, len); - if (err) - DCCP_BUG("err=%d after ccid_hc_tx_packet_sent", - err); + local_bh_enable(); + + if (unlikely(err)) + DCCP_BUG("dccp_transmit_skb returned %d", err); } else { dccp_pr_debug("packet discarded due to err=%d\n", err); kfree_skb(skb); -- 1.5.0.6 - To unsubscribe from this list: send the line "unsubscribe dccp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html