conntrack and RSTs received during CLOSE_WAIT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm using Linux kernel 2.6.26 with conntrack/connlimit to prevent people from DOSing our Web servers by opening up too many simultaneous connections from one IP address. This is mostly for protection against unintentional DOSes from broken proxy servers that try to open up literally hundreds of simultaneous connections; we DROP their syn packets if they already have 40 connections open.

This is generally working well (and thanks to folks on this list for the hard work that makes this possible).

However: Some clients send evil TCP RSTs that confuse conntrack and break connlimit in a way that I'll detail below. First, here's a sample recreation:

 client > server [SYN] Seq=0 Len=0
 server > client [SYN,ACK] Seq=0 Ack=1 Len=0
 client > server [ACK] Seq=1 Ack=1 Len=0
 client > server [PSH,ACK] Seq=1 Ack=1 Len=420 (HTTP GET request)
 server > client [ACK] Seq=1 Ack=421 Len=0
 server > client [ACK] Seq=1 Ack=421 Len=1448    (HTTP response)
 server > client [ACK] Seq=1449 Ack=421 Len=1448 (more HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (more HTTP response)
 client > server [FIN,ACK] Seq=421 Ack=1449 Len=0
 server > client [ACK] Seq=4345 Ack=422 Len=1448 (more HTTP response)
 server > client [ACK] Seq=5793 Ack=422 Len=1448 (more HTTP response)
 client > server [RST] Seq=421 Len=0
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)
 server > client [ACK] Seq=2897 Ack=421 Len=1448 (retr HTTP response)

Everything up to and including the "RST" takes place in under a tenth of a second. The remaining ten retransmits take place over 5 minutes.

As soon as the client received the first packet of the HTTP response, it decided to close the connection. This appears to be due to a SonicWall firewall on the client end, which examines the Content-Type of the HTTP reply and immediately shuts down the connection if it's a "forbidden" type. This is apparently common.

From the server's TCP stack point of view, this connection enters the CLOSE_WAIT state when the FIN is received. The stack then waits for Apache to close() the socket. However, Apache doesn't close the socket for five minutes. That's because it's blocked waiting for a socket write to complete, and it doesn't notice the end-of-input on the socket until the write times out. (Yes, according to netstat, the connection remains in CLOSE_WAIT even after the RST packet, which surprised me, but that's how Linux works, apparently.)

If the client opens up hundreds of these connections within five minutes, it can use up hundreds of Apache process slots. I want connlimit to prevent that, and it looks like it should, because conntrack should be tracking the CLOSE_WAIT connections just like any other connections. To make sure it tracks them long enough, I've set ip_conntrack_tcp_timeout_close_wait to 5 minutes.

However, the RST packet screws things up. As I said, the kernel ignores the RST packet and leaves the connection in CLOSE_WAIT. But when conntrack sees the RST packet, it marks the connection CLOSEd, and then forgets about it 10 seconds later.

What happens next depends on whether nf_conntrack_tcp_loose is set. If it's set to 1, the server's retransmitted packets cause a new, "fake" connection to be ESTABLISHED in conntrack, which lingers for five days(!). We originally had it set that way, but a couple of legitimate customers were complaining about still being blocked from our servers for five days after they'd actually closed all their connections.

So we set nf_conntrack_tcp_loose to 0. That solved the "blocked for five days" problem.... but now the CLOSE_WAIT connections quickly go to CLOSE in conntrack when the RST arrives and are totally forgotten ten seconds later. A rogue client can quickly get 40 connections into the CLOSE_WAIT state, then wait ten seconds and open 40 more, etc., occupying up to 1200 Apache process slots within five minutes.

What we really want is for conntrack to match what the kernel does: to ignore the RST packet for CLOSE_WAIT connections, leaving the connection to remain in the conntrack CLOSE_WAIT state until ip_conntrack_tcp_timeout_close_wait expires. That looks easy to do with a change to nf_conntrack_proto_tcp.c:

-/*rst*/    { sIV, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sIV },
+/*rst*/    { sIV, sCL, sCL, sCL, sCL, sCW, sCL, sCL, sCL, sIV },

... but I'd rather not maintain a custom compiled kernel just for that.

So I've considered other solutions:

1. Set nf_conntrack_tcp_loose to 1, but change ip_conntrack_tcp_timeout_established to 1 hour (instead of 5 days). This would make sure that people aren't blocked for more than an hour after they close all their connections. However, that's still not ideal -- and it would also allow someone to intentionally bypass connlimit by opening 40 connections, then leaving them idle for an hour, then opening 40 more, and so on.

2. Set nf_conntrack_tcp_loose to 0, and change nf_conntrack_tcp_timeout_close to 5 minutes (instead of 10 seconds). This would only block people for the 5 minutes that they're still taking up an Apache process slot, but would also block anyone who sends 40 TCP RSTs within 5 minutes for any reason. You wouldn't think that this would be a problem, but RSTs actually seem quite common on a busy Web server with a fairly low HTTP keepalive value.

Does anyone have any other suggestions about how to make conntrack remember these connections during (and only during) the five-minute period netstat shows them as CLOSE_WAIT?

--
Robert L Mathews, Tiger Technologies     http://www.tigertech.net/
--
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux