Re: SV: SV: Conntrack insertion race conditions -- any workarounds?

zrm <zrm@xxxxxxxxxxxxxxx> · Fri, 7 Sep 2018 01:08:41 -0400

On 09/06/2018 05:32 PM, André Paulsberg-Csibi (IBM Consultant) wrote:
 From my understanding after development of FireWall and TCP/IP in general used in modern networking , it seems that it has been reasoned that clients should avoid to send 2 separate requests with same source port .
( again , not as an absolute rule , but certainly as a strong rule of thumb )

This is generally true for TCP because each connection will have its own 
socket and unless a specific source port is requested (which there is 
often no reason to do), the operating system will arbitrarily assign one 
for each connection.

But the opposite is true for UDP -- it's common for a UDP program to 
bind one socket to one source port and use it for all its 
communications, distinguishing peers by the remote IP and port (which is 
provided when using the connectionless sendto(2)/recvfrom(2) but not the 
connection-oriented send(2)/recv(2)). If it weren't for the Kaminsky 
attack DNS clients would do this as well.

Using a separate source port for a second connection from the same 
client to the same remote IP and port is also necessary for TCP because 
otherwise the streams would be intermixed, but that isn't an issue for 
UDP because unlike TCP streams, datagrams preserve message boundaries.

I am not sure it is correct to describe 2 separate request via UDP as a 
flow , but I agree that the client isn't directly doing something that 
is "wrong" .
As you say historically like my DHCP RELAY example it was accepted ( and normal ) to only user port 67 as both SOURCE and DESTINATION port .
However it doesn't take much reasoning to argue that this is potentially problematic for any sort of state tables , and it seems un-economic to make more advanced state tables or mark packets to avoid certain scenarios .

These are two separate problems. One is that every client uses the same 
source (and destination) port, which creates a problem for any 
one-to-many NAT device. For each remote host only one client can have 
that port pair on the external IP address. This can be solved by the NAT 
translating the second client to some other source port, but only if the 
server doesn't require the client to use that specific source port.

By contrast, what's happening in the case that spawned this thread is 
that the client uses the same source port for two separate packets, but 
the source port is still random and unlikely to conflict with some other 
client, and there is no issue for the NAT to translate them back to the 
original client because the translation for both packets is the same. 
The bug is that the firewall processes them incorrectly when two packets 
with the same new source and destination address and port are processed 
concurrently -- an implementation flaw, not a design flaw. The packet 
marking and so forth is only an attempted workaround until the patch is 
in place.

( compared to opening 2 sockets from the various clients which have a distributed "load" for millions of clients , while the "servers" and "firewalls" need to be optimized for millions of requests )

When you're dealing with libc the clients run the gamut. Some will be 
mobile devices where every cycle they spend consumes battery life. Many 
of the "clients" will themselves be servers. Web crawlers and mail 
servers spend a good fraction of their cycles making DNS queries.

I disagree this is a bug in the FIREWALL(s) as this would ONLY happen when reusing source port which in my opinion isn't reasonable optimization for simple clients

There are other circumstances where this can happen or is required to. 
For example, an internal DNS cache makes outgoing queries when (for 
about 20% of requests) it doesn't have the record cached, and it will 
regularly reuse source ports. By the birthday problem, if it randomly 
chooses a source port for each request, by around 300 queries there is a 
50% chance that two of them will use the same source port. Not reusing 
ports would increase exposure to the Kaminsky attack (and risk port 
exhaustion).

 , and mind you this is the security feature of the FIREWALL to track 
states and for DNS there are no flows like you have with SIP / SYSLOG .
Which also reuse the SOURCE port for their flows , but these are known directly for establishing such flows - which is not the same for DNS which after also using random ports for each request now make 2 for each after IPv6 was added .

IPv6 has been with us for years though, and mail servers have done a 
similar thing even longer. The host for sending mail to a domain is 
specified in the MX record unless there isn't one, in which case the A 
record is used, and some mail software will do the MX and A queries 
simultaneously.

DNS doesn't have "flows" in the sense that it is mostly individual query 
transactions (though see also dynamic updates and zone transfers), but 
treating a set of UDP DNS query transactions using the same ports as a 
flow in the style of any generic indeterminate UDP-based protocol 
produces the desired results (and is more likely not to break new or 
uncommon protocol features), so where is the advantage in 
application-protocol-specific treatment? In theory you could discard 
state sooner, but if you're so close to the point of port exhaustion 
that this really matters, you may be better off acquiring more IP 
addresses instead. Reusing the port mappings more quickly like that 
would fall into the same issue that causes TCP to have a TIME_WAIT state 
-- without it a packet (or retransmit) for the old mapping could be sent 
to the new one.