Re: Congestion Control uses packets instead of segments? (bug?).

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've got the solution to all your problems: don't use TCP. Your application requirements (fine control of when packets are sent, minimizing delay, custom congestion control) make it clear that what you want is a custom transport.

Jonathon Ross wrote:

Summary:

Linux's TCP congestion control looks like it uses packets outstanding
instead of segments outstanding for control (unlike BSD). This is a big
problem for us because we use lots of small packets for latency. Also when
the congestion control kicks in, it looks like the max number of outstanding
packets linux allows is 1, which excavates my problem.

Why I care:

This isn't an internet application, so don't flame me about lots of small
packets w/ TCP_NODELAY on. Customers pay us & telco costs for extremely low
latency finical market data, and the ability to place and cancel orders in
under 10ms. Because of the number of black-box driven trading system today,
markets move very quickly and a 30ms delay because of a TCP ACK is very bad.

Examples:

from the 2.4.20 kernel--

Server -> sends data
Client -> sends ack
Server -> sends data
Client -> sends ack
Server -> sends data
Client -> sends data/heartbeat/order/etc. and piggybacked ack
(At this point linux considers the connection bi-directional and kicks off
the delayed ack timer)
Server -> sends data
Server -> sends data
(No Ack Received, congestion window considered full with 2 outstanding
packets, server stack will send no data)
Client -> Delayed ack timer expires, or client sends data+ack
Server resumes.

which you can see in this sniff:



001934 10.50.6.158.9090 > 10.50.6.53.34990: P 11639:11662(23) ack 1 win 5792
<nop,nop,timestamp 1027452346 431322340> (DF)

000069 10.50.6.53.34990 > 10.50.6.158.9090: . ack 11662 win 5840
<nop,nop,timestamp 431322340 1027452346> (DF)
001936 10.50.6.158.9090 > 10.50.6.53.34990: P 11662:11685(23) ack 1 win 5792
<nop,nop,timestamp 1027452346 431322340> (DF)

000068 10.50.6.53.34990 > 10.50.6.158.9090: . ack 11685 win 5840
<nop,nop,timestamp 431322340 1027452346> (DF)
001934 10.50.6.158.9090 > 10.50.6.53.34990: P 11685:11708(23) ack 1 win 5792
<nop,nop,timestamp 1027452346 431322340> (DF)

000069 10.50.6.53.34990 > 10.50.6.158.9090: . ack 11708 win 5840
<nop,nop,timestamp 431322340 1027452346> (DF)
001936 10.50.6.158.9090 > 10.50.6.53.34990: P 11708:11731(23) ack 1 win 5792
<nop,nop,timestamp 1027452346 431322340> (DF)

002004 10.50.6.158.9090 > 10.50.6.53.34990: P 11731:11754(23) ack 1 win 5792
<nop,nop,timestamp 1027452347 431322340> (DF)

000670 10.50.6.53.34990 > 10.50.6.158.9090: P 1:2(1) ack 11708 win 5840
<nop,nop,timestamp 431322341 1027452346> (DF)
000098 10.50.6.158.9090 > 10.50.6.53.34990: . ack 2 win 5792
<nop,nop,timestamp 1027452347 431322341> (DF)
037077 10.50.6.53.34990 > 10.50.6.158.9090: . ack 11754 win 5840
<nop,nop,timestamp 431322345 1027452346> (DF)
000169 10.50.6.158.9090 > 10.50.6.53.34990: P 11754:12168(414) ack 2 win
5792 <nop,nop,timestamp 1027452350 431322345> (DF)

000004 10.50.6.158.9090 > 10.50.6.53.34990: P 12168:12191(23) ack 2 win 5792
<nop,nop,timestamp 1027452350 431322345> (DF)

000100 10.50.6.53.34990 > 10.50.6.158.9090: . ack 12168 win 6432
<nop,nop,timestamp 431322345 1027452350> (DF)
000020 10.50.6.53.34990 > 10.50.6.158.9090: . ack 12191 win 6432
<nop,nop,timestamp 431322345 1027452350> (DF)
001821 10.50.6.158.9090 > 10.50.6.53.34990: P 12191:12214(23) ack 2 win 5792
<nop,nop,timestamp 1027452351 431322345> (DF)

000072 10.50.6.53.34990 > 10.50.6.158.9090: . ack 12214 win 6432
<nop,nop,timestamp 431322345 1027452351> (DF)
001931 10.50.6.158.9090 > 10.50.6.53.34990: P 12214:12237(23) ack 2 win 5792
<nop,nop,timestamp 1027452351 431322345> (DF)

000068 10.50.6.53.34990 > 10.50.6.158.9090: . ack 12237 win 6432
<nop,nop,timestamp 431322345 1027452351> (DF)
001935 10.50.6.158.9090 > 10.50.6.53.34990: P 12237:12260(23) ack 2 win 5792
<nop,nop,timestamp 1027452351 431322345> (DF)


Also, I think this is the same problem: http://www.icase.edu/coral/LinuxTCP.html



What I think I'd like, in order of desirability:

1) The congestion control to use segments instead of packets. Having 1
50-byte packet un-ACKed and the stream stopping seems a little excessive.
Stopping the stream after 1 un-ACKed MTU seems reasonable.

2) A way to increase the number of outstanding packets to number greater
than 1.

3) A way to disable the congestion control entirely.


If I'm totally mistaken, and I don't understand my problem, or what I think I need, please tell me. Any help would be appreciated.

Thanks,
-Jon

------------------------------------------------------------------------
This message is for the designated recipient only and may
contain privileged or confidential information. If you have
received it in error, please notify the sender immediately
and delete the original. Any other use of the email is prohibited.
-----------------------------------------------------------------------!



-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html




--
Casey Carter
Casey@Carter.net
ccarter@cs.uiuc.edu
AIM: cartec69


- : send the line "unsubscribe linux-net" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux