On Wed, 2015-02-04 at 05:16 -0800, Eric Dumazet wrote: > OK guys > > Using a mlx4 testbed I can reproduce the problem by pushing coalescing > settings and disabling SG (thus disabling GSO) > > ethtool -K eth0 sg off > Actual changes: > scatter-gather: off > tx-scatter-gather: off > generic-segmentation-offload: off [requested on] > > ethtool -C eth0 tx-usecs 1024 tx-frames 64 > > Meaning that NIC waits one ms before sending the TX IRQ, > and can accumulate 64 frames before forcing the interrupt. > > We probably have a bug in cwnd expansion logic : > > lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.246.7.152 -Cc > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET > rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=230 rttvar=30 snd_ssthresh=41 cwnd=59 reordering=3 total_retrans=1 ca_state=0 pacing_rate=5943.1 Mbits > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB > > 87380 16384 16384 10.00 530.39 0.40 0.32 2.965 2.398 > > > -> final cwnd=59 which is not enough to avoid the 1ms delay between each > burst. > > So sender sends ~60 packets, then has to wait 1ms (to get NIC TX IRQ) > before sending the following burst. > > I am CCing Neal, he probably can help to root cause the problem. Arg, this was with net-next, ie not including our recent stretch ack fixes. Using David Miller 'net' tree, cwnd seems OK. Speed is low because of 64 queued frames are exceeding tcp_limit_output_bytes lpaa23:~# cat /proc/sys/net/ipv4/tcp_limit_output_bytes 131072 lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.246.7.152 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=166 rttvar=16 snd_ssthresh=26 cwnd=59 reordering=3 total_retrans=0 ca_state=0 pacing_rate=8203.52 Mbits Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 569.96 0.52 0.38 3.588 2.625 lpaa23:~# echo 262144 >/proc/sys/net/ipv4/tcp_limit_output_bytes lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.246.7.152 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=98 rttvar=18 snd_ssthresh=312 cwnd=313 reordering=3 total_retrans=23 ca_state=0 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 8518.40 2.60 1.57 1.200 0.727 -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html