TCP Stall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This thread, which I've assembled below into a single message, discusses a
problem with TCP stalls that have been observed when running TAP tunnels on
Linux 2.4.22 using OpenVPN.

I would appreciate any comments or insights.

James Yonan
OpenVPN Project

************************************

I am running an FTP session over a virtual "tap" interface which is being
tunnelled by OpenVPN.  The hosts are 2.4.22 on both sides, connected via
local ethernet.

The problem is very easy to replicate if the TCP MSS is lowered to 1000 or
less (I am working on a new beta of OpenVPN which knows how to mangle TCP
SYN packets to lower the MSS).

The symptoms of the problem are as follows:

(1) FTP session of approximately 1.2 megabyte file over a tap 
    interface progresses to near completion, then stalls indefinitely.

(2) A ping over the tap interface unstalls the TCP session and it 
    completes normally.

(3) The problem only occurs when tunnelling over a tap interface.
    Tun interface works perfectly.  The problem occurs with less
    frequency when the MSS is left at its default setting.  When
    MSS is lowered, the stall probability increases to over 50%
    per FTP session.

Here is a tcpdump from both sides of the connection of the tap interface.  
Both clocks are very precisely synchronized using ntp.  I have extracted
the part of the tcpdump just prior to and after the stall:

***********************************************************
***** FTP send side (test-local is us,
***** test-remote is other side of connection):

07:52:38.962180 test-local.33133 > test-remote.ftp-data: .
1586641:1587553(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151211> (DF)
[tos 0x8]
07:52:38.962186 test-local.33133 > test-remote.ftp-data: .
1587553:1588465(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:38.962193 test-local.33133 > test-remote.ftp-data: .
1588465:1589377(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:38.964007 test-remote.ftp-data > test-local.33133: . ack 1581169 win
63840 <nop,nop,timestamp 44152747 3957856> (DF)
07:52:38.964335 test-remote.ftp-data > test-local.33133: . ack 1582993 win
63840 <nop,nop,timestamp 44152747 3957856> (DF)
07:52:38.965077 test-remote.ftp-data > test-local.33133: . ack 1584817 win
63840 <nop,nop,timestamp 44152747 3957856> (DF)
07:52:38.965365 test-remote.ftp-data > test-local.33133: . ack 1586641 win
63840 <nop,nop,timestamp 44152747 3957856> (DF)
07:52:38.966130 test-remote.ftp-data > test-local.33133: . ack 1588465 win
63840 <nop,nop,timestamp 44152747 3957858> (DF)
07:52:38.998900 test-remote.ftp-data > test-local.33133: . ack 1589377 win
63840 <nop,nop,timestamp 44152751 3957858> (DF)

***** Stall occurs here
***** Now I ping test-remote from test-local, and within 50 microseconds, 
test-local starts sending data again

07:52:49.862597 test-local > test-remote: icmp: echo request (DF)
07:52:49.862634 test-local.33133 > test-remote.ftp-data: .
1589377:1590289(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:49.862643 test-local.33133 > test-remote.ftp-data: .
1590289:1591201(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:49.862649 test-local.33133 > test-remote.ftp-data: .
1591201:1592113(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:49.862655 test-local.33133 > test-remote.ftp-data: .
1592113:1593025(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:49.862662 test-local.33133 > test-remote.ftp-data: .
1593025:1593937(912) ack 1 win 5792 <nop,nop,timestamp 3957859 44151213> (DF)
[tos 0x8]
07:52:49.862668 test-local.33133 > test-remote.ftp-data: .
1593937:1594849(912) ack 1 win 5792 <nop,nop,timestamp 3957859 44151213> (DF)
[tos 0x8]
07:52:49.862675 test-local.33133 > test-remote.ftp-data: .
1594849:1595761(912) ack 1 win 5792 <nop,nop,timestamp 3957859 44151213> (DF)
[tos 0x8]

******************************************************* 
***** FTP receive side (test-local is us,
***** test-remote is other side of connection):

07:52:38.964095 test-remote.33133 > test-local.ftp-data: .
1582993:1583905(912) ack 1 win 5792 <nop,nop,timestamp 3957856 44151211> (DF)
[tos 0x8]
07:52:38.964838 test-remote.33133 > test-local.ftp-data: .
1583905:1584817(912) ack 1 win 5792 <nop,nop,timestamp 3957856 44151211> (DF)
[tos 0x8]
07:52:38.964853 test-local.ftp-data > test-remote.33133: . ack 1584817 win
63840 <nop,nop,timestamp 44152747 3957856> (DF)
07:52:38.964967 test-remote.33133 > test-local.ftp-data: .
1584817:1585729(912) ack 1 win 5792 <nop,nop,timestamp 3957856 44151211> (DF)
[tos 0x8]
07:52:38.965177 test-remote.33133 > test-local.ftp-data: .
1585729:1586641(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151211> (DF)
[tos 0x8]
07:52:38.965188 test-local.ftp-data > test-remote.33133: . ack 1586641 win
63840 <nop,nop,timestamp 44152747 3957856> (DF)
07:52:38.965927 test-remote.33133 > test-local.ftp-data: .
1586641:1587553(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151211> (DF)
[tos 0x8]
07:52:38.965988 test-remote.33133 > test-local.ftp-data: .
1587553:1588465(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:38.966022 test-local.ftp-data > test-remote.33133: . ack 1588465 win
63840 <nop,nop,timestamp 44152747 3957858> (DF)
07:52:38.966142 test-remote.33133 > test-local.ftp-data: .
1588465:1589377(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:38.998783 test-local.ftp-data > test-remote.33133: . ack 1589377 win
63840 <nop,nop,timestamp 44152751 3957858> (DF)

**** session stalls

**** ping echo request arrives followed by TCP data
**** resumption

07:52:49.864623 test-remote > test-local: icmp: echo request (DF)
07:52:49.864676 test-local > test-remote: icmp: echo reply
07:52:49.864853 test-remote.33133 > test-local.ftp-data: .
1589377:1590289(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:49.865069 test-remote.33133 > test-local.ftp-data: .
1590289:1591201(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:49.865082 test-local.ftp-data > test-remote.33133: . ack 1591201 win
63840 <nop,nop,timestamp 44153837 3957858> (DF)
07:52:49.866116 test-remote.33133 > test-local.ftp-data: .
1591201:1592113(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:49.866268 test-remote.33133 > test-local.ftp-data: .
1592113:1593025(912) ack 1 win 5792 <nop,nop,timestamp 3957858 44151213> (DF)
[tos 0x8]
07:52:49.866276 test-local.ftp-data > test-remote.33133: . ack 1593025 win
63840 <nop,nop,timestamp 44153837 3957858> (DF)
07:52:49.866416 test-remote.33133 > test-local.ftp-data: .
1593025:1593937(912) ack 1 win 5792 <nop,nop,timestamp 3957859 44151213> (DF)
[tos 0x8]
07:52:49.866620 test-remote.33133 > test-local.ftp-data: .
1593937:1594849(912) ack 1 win 5792 <nop,nop,timestamp 3957859 44151213> (DF)
[tos 0x8]
07:52:49.866630 test-local.ftp-data > test-remote.33133: . ack 1594849 win
63840 <nop,nop,timestamp 44153837 3957859> (DF)
07:52:49.867311 test-remote.33133 > test-local.ftp-data: . 
1594849:1595761(912) ack 1 win 5792 <nop,nop,timestamp 3957859 44151213> (DF)
[tos 0x8]


***************** END
*******************************************

> Hmmm, the sending system is where the stall occurs.  It stops after
> receiving an ack of everything it has sent.
> 
> The stall occurred right after the sender had gotten several packets
> ahead of the receiver and apparently paused to wait for acks to catch
> up.  Presumably it set some timeout for that pause.  The timeout
> proved unnecessary, because all the acks arrived, but the sender seems
> to have failed to notice.
> 
> Offhand, it looks like the sender has that final ack stuck in a buffer
> ... some buffer flush hasn't occurred, but when a ping packet arrives
> the flush does occur.  Sounds like you're right about a timing issue.
> The last ack's arrival triggers an interrupt that causes a buffer
> flush but it happens just before the packet get completely into the
> buffer.  The packet, or part of it, sits there until another interrupt
> flushes it out.
> 
> It sounds like the kind of thing that could happen when a very fast
> cpu responds to a device interrupt before the device is completely
> ready.
> 
> I'd be trying ftps in the other direction and to/from other systems,
> trying to isolation what conditions make this happen.

One thing that I have tried to establish is whether or not it is an OpenVPN
bug.  I think the fact that we see the return ACK transiting the tap interface
of the sender, but then be ignored by the sender, makes me believe (as I think
you do) that the ACK is getting stuck in a buffer somewhere and needs the ping
to get unstuck -- pointing to the kernel rather than OpenVPN as the culprit.
It also seems strange that the TCP code, even if it was
unaware that the ACK had indeed been received, would stall for so long without
trying to retransmit the un-ACKed segment.  After all, what if the ACK had been
dropped somewhere along the network path.

More notes...

In the example above, the sender is slow (266mhz P2) and the receiver is fast
(2.2 GHz).  Reversing the two also causes stalls but less frequently.

Lowering the TCP MSS increases the probability of stalling.

I believe that the stall may also occur on Windows <-> Linux, but because
Windows is more chatty with ARPs and broadcasts, the ARP messages may trigger
the unstall, causing reduced performance without a plainly evident stall.

I may just try to debug it myself, but I don't do a lot of linux kernel
hacking so it would help to know where to start looking.  It seems to be some
sort of interaction between tap interfaces and TCP.

There is a patch by Alexey Kuznetsov in the TCP code as late as early 2003,
which has the comment "Fix TCP stall" -- The patch first appeared in
2.4.21-pre4-ac1 and is now in 2.4.22.  This patch doesn't appear to fix the
problem.

******************************************

I agree this does not look like an OpenVPN bug.  However, it could be
some condition that only OpenVPN/TAP triggers.

(OTOH I would argue that for years Linux has had a problem finishing
certain tcp connections.  I've seen it when browsing - a page loads
promptly and appears to load completely but netscape never recognizes
that the download is complete and sits there forever saying it's
waiting.  Usually a reload loads to completion instantly.  Given the
prevalence of web servers that finish connections with a reset rather
than a conventional shutdown I've suspected this was an issue with the
reset arriving before the last bit of data was given to netscape.)

******************************************

> Hard to say if this is the case in the vaguely reported cases, however in my
> linux tap stall which I can reproduce with 100% certainty, the UDP channel
> between the hosts is definitely alive, OpenVPN is correctly forwarding
> packets, but the TCP code is stalling, i.e. it is not sending packets even
> though its receiver has apparently ACKed all previously sent packets.

I can reproduce the stalls with Linux TAP (sender 2.4.22-pac1, receiver
some SuSE 2.4.20 kernel). The sender is a Duron/700 CPU with PC133 RAM,
the receiver is an IBM 6x86MX/233 ("PR 300") with PC66 RAM. No MSS
tuning necessary, the stall occurs around 1 to 1.5 MB, it happens
earlier with heavy screen dumps going on.

When the tunnel is stalled, the machine has like 70 kB in the send queue
(netstat -tn), and OpenVPN has selected fd #3 and #4 for read. It reads
and processes PING request from the peer fine. tcpdump -itap1 shows that
the ACKs are completely received. Makes me wonder...

I also find that the BIG_TIMEOUT is set on the sender.

I'll now try with a FreeBSD sender to a Linux 2.4.20 client.

> Since the problem occurs with high probability only when tap devices are used
> with a lowered MSS, it is possible that this relatively uncommon combination
> of settings could trigger a previously unknown bug in the TCP code.

I can't confirm "only ... with a lowered MSS". Default packet sizes as
of 1.5-beta9.

********************************************

> My gut feeling is that TCP is losing a timer signal somewhere which would
> normally trigger it to send the next packet in the send queue.  But it's a
> mystery to me why it only occurs with ethernet encapsulation.
> 
> It would be very cool to track this one down since it will bite a lot of
> OpenVPN users who are going between Windows and Linux.  Most of them won't
> stall completely because ARP and broadcast traffic will eventually unstall the
> transfers, but performance will certainly suffer.

You can use Linux' "ping -A" (send a new ping as soon as the previous
has arrived, a lighter "ping -f" flood mode) but performance is way
below acceptable levels; while I can stuff 7 MB/s down to the
almost-decrepit IBM 6x86MX-233 with an unencrypted tunnel with a FreeBSD
head, Linux even with ping -A is way below 100 kB/s. Granted, the
slow Linux receiver also suffers from a RTL8139 junk card, but it works
with FreeBSD.

I'd think if it's not OpenVPN (I won't bet my car it's bug-free, I
haven't yet audited the loop, it looks rather complex, and might also
suffer from subtle select() bugs but I'd think these hit Solaris rather
than Linux), we still have two Linux places where the bug can linger:
TCP code (more likely given that every ACK is visible in tcpdump -itap1)
or perhaps the tap driver. Does OpenVPN support the "obsolete" ethertap
driver? (I found out the hard way that it's still the default on SuSE
Linux 8.2, when two tap tunnels wouldn't talk to each other...)

OTOH, I still have low-traffic stalls on SSH interactive connections
which get unstalled quicker than bulk transfers (maybe because of the
PSH flag, who knows) and don't ever stall if the peer is pinged once a
second.

********************************


-
: send the line "unsubscribe linux-net" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Netdev]     [Ethernet Bridging]     [Linux 802.1Q VLAN]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Git]     [Bugtraq]     [Yosemite News and Information]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux PCI]     [Linux Admin]     [Samba]

  Powered by Linux