Re: SV: SV: Conntrack insertion race conditions -- any workarounds?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/06/2018 05:32 PM, André Paulsberg-Csibi (IBM Consultant) wrote:
 From my understanding after development of FireWall and TCP/IP in general used in modern networking , it seems that it has been reasoned that clients should avoid to send 2 separate requests with same source port .
( again , not as an absolute rule , but certainly as a strong rule of thumb )

This is generally true for TCP because each connection will have its own socket and unless a specific source port is requested (which there is often no reason to do), the operating system will arbitrarily assign one for each connection.

But the opposite is true for UDP -- it's common for a UDP program to bind one socket to one source port and use it for all its communications, distinguishing peers by the remote IP and port (which is provided when using the connectionless sendto(2)/recvfrom(2) but not the connection-oriented send(2)/recv(2)). If it weren't for the Kaminsky attack DNS clients would do this as well.

Using a separate source port for a second connection from the same client to the same remote IP and port is also necessary for TCP because otherwise the streams would be intermixed, but that isn't an issue for UDP because unlike TCP streams, datagrams preserve message boundaries.

I am not sure it is correct to describe 2 separate request via UDP as a flow , but I agree that the client isn't directly doing something that is "wrong" .
As you say historically like my DHCP RELAY example it was accepted ( and normal ) to only user port 67 as both SOURCE and DESTINATION port .
However it doesn't take much reasoning to argue that this is potentially problematic for any sort of state tables , and it seems un-economic to make more advanced state tables or mark packets to avoid certain scenarios .

These are two separate problems. One is that every client uses the same source (and destination) port, which creates a problem for any one-to-many NAT device. For each remote host only one client can have that port pair on the external IP address. This can be solved by the NAT translating the second client to some other source port, but only if the server doesn't require the client to use that specific source port.

By contrast, what's happening in the case that spawned this thread is that the client uses the same source port for two separate packets, but the source port is still random and unlikely to conflict with some other client, and there is no issue for the NAT to translate them back to the original client because the translation for both packets is the same. The bug is that the firewall processes them incorrectly when two packets with the same new source and destination address and port are processed concurrently -- an implementation flaw, not a design flaw. The packet marking and so forth is only an attempted workaround until the patch is in place.

( compared to opening 2 sockets from the various clients which have a distributed "load" for millions of clients , while the "servers" and "firewalls" need to be optimized for millions of requests )

When you're dealing with libc the clients run the gamut. Some will be mobile devices where every cycle they spend consumes battery life. Many of the "clients" will themselves be servers. Web crawlers and mail servers spend a good fraction of their cycles making DNS queries.

I disagree this is a bug in the FIREWALL(s) as this would ONLY happen when reusing source port which in my opinion isn't reasonable optimization for simple clients

There are other circumstances where this can happen or is required to. For example, an internal DNS cache makes outgoing queries when (for about 20% of requests) it doesn't have the record cached, and it will regularly reuse source ports. By the birthday problem, if it randomly chooses a source port for each request, by around 300 queries there is a 50% chance that two of them will use the same source port. Not reusing ports would increase exposure to the Kaminsky attack (and risk port exhaustion).

, and mind you this is the security feature of the FIREWALL to track states and for DNS there are no flows like you have with SIP / SYSLOG .
Which also reuse the SOURCE port for their flows , but these are known directly for establishing such flows - which is not the same for DNS which after also using random ports for each request now make 2 for each after IPv6 was added .

IPv6 has been with us for years though, and mail servers have done a similar thing even longer. The host for sending mail to a domain is specified in the MX record unless there isn't one, in which case the A record is used, and some mail software will do the MX and A queries simultaneously.

DNS doesn't have "flows" in the sense that it is mostly individual query transactions (though see also dynamic updates and zone transfers), but treating a set of UDP DNS query transactions using the same ports as a flow in the style of any generic indeterminate UDP-based protocol produces the desired results (and is more likely not to break new or uncommon protocol features), so where is the advantage in application-protocol-specific treatment? In theory you could discard state sooner, but if you're so close to the point of port exhaustion that this really matters, you may be better off acquiring more IP addresses instead. Reusing the port mappings more quickly like that would fall into the same issue that causes TCP to have a TIME_WAIT state -- without it a packet (or retransmit) for the old mapping could be sent to the new one.



[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux