Re: Conntrack insertion race conditions -- any workarounds?

Kyle Larose <kyle@xxxxxxxxxxxx> · Fri, 7 Sep 2018 09:14:47 -0400

On 7 September 2018 at 04:24, André Paulsberg-Csibi (IBM Consultant)
<Andre.Paulsberg-Csibi@xxxxxxxx> wrote:

> You can always argue that this should also be fixed in the FireWall , to avoid the scenario dropping such packets ( and I can partially agree ) .
> But from a security perspective you will need to have some clear boundaries to know which and when to drop packets coming .
> And DNS is a rather important and unremovable part of todays Internet , so making some effort to make it easier to protect from multiple angles are a reasonable argument .

I think we're getting hung up on semantics here. From a layering
perspective, what libc is doing is perfectly fine at the UDP layer --
it is simply sending two messages, which are framed using DNS. Compare
that to something like QUIC, which is going to send *A LOT* of
messages using the same five-tuple, since it's essentially packing
many streams of HTTP into one single UDP connection.

I'm not sure what security flaw we're concerned with here...

> I am pretty sure this statement is not entirely correct , first of picking a random number from 1025-65535 as your source port does not give you a 50% of hitting any random 300 ports currently in use .

I don't think this is a uniform distribution. See
https://en.wikipedia.org/wiki/Birthday_problem. Maybe that doesn't
apply here, but my gut tells me it does.

> Secondly , you seem to forget/ignore that when we talk about reuse of SOURCE port the issue is in combination with contacting the SAME SERVICE to a single remote host ...
> ... which would give you all 4 headers IP header fields used in state tables to be equal ( aka SOURCE IP , SOURCE PORT , DESTINATION IP , DESTINATION PORT ) prior to responding ( and removing from state table )

But this is the case we are running into. We have a client which is
making back to back requests to the same DNS server within
microseconds. Does it really make sense for that client to spend
cycles opening a new socket for each request? Sure, if it has closed
the socket because it thinks it doesn't need it any longer, but when
it's doing happy eyeballs, it *knows* it needs to send two requests,
so why not reuse the socket? What if it needs to send 100 requests?
Should it use a separate connection for each of them? That sounds like
a great way to kill firewalls or other statefull middleboxes.

> Unless you use a SINGLE forwarder for your internal DNS CACHE , which would in most cases make you a standalone client with a "lower" number of request ,
> you would need about 10000 lookups per second using a normal 2 second timeout per query for a single query to have a *theoretical* 50% chance to hit one of the old SOURCE PORTS  .
> Statistically from the 10000 request , the average time to get response is around 10-50ms meaning you would theoretically have around 300 request active at any 33ms and expect 300 new request in the same time-frame
> Of which you could *theorize* that in total there is a somewhere around 50% chance 1/300 of new request could potentially hit 1/300 of the old request within that 33ms timeframe .
>
> However , when you consider some anecdotal evidence from a single host in a "fairly" busy WEB  PROXY CLUSTER have 1000 DNS request per second with a 90% DNS CACHE hitrate
> leaving you with 100 request per second on average -> you seem likely to have other concerns if you client DNS on whatever system you setup reaches 10000 DNS request per second .
> So as I told prior with state TTL , if the table leaves DNS states "alive" for lets say a UDP state TTL of 40 seconds , then that would leave 4000 states on average per server in the FireWall in above scenario
> With 10000 DNS request per second it would practially be every SOURCE port is open all the time , which in my opinion would be a another good/solid argument security-wize
> to remove from state table any/all state entries where matching response has been seen passing the firewall for DNS and allowing only ONE request and ONE response per SOURCE port .
>

I don't quite follow this. Are you saying that if something like 1000
clients in the internet send HTTP requests to the web proxy, which in
turns makes 1000 DNS requests, 100 of which actually need to go
through the firewall to a DNS server. So, if each request used a
different source port, we'd consume 100 new ports per second? Wouldn't
that be an argument for using the same source port the whole time?

On the other hand, if the firewall needed to clean up state for each
single DNS request, you'd double the work for each request made by the
clients of libc. The vast majority of work in flow-aware middleboxes
is in state setup and teardown. You're basically doubling the amount
of that for the most common DNS query pattern made by Linux systems...
of course, we could be clever, and track the number of outstanding
requests,  only cleaning up the state when the final request comes in
or it times out. This would solve both problems, but it does not solve
the problem we're facing in conntrack, which is that the two requests
come into the system almost in parallel, triggering that race
condition. We would never have had the chance to clean up the state,
because the first request never left the system, meaning there was no
chance for the DNS server to respond to it.

> Again , I believe this is only realy true for the endpoints ( as any component in the middle would normally not be aware of any boundaries )
> And that using UDP assuming the boundaries are respected across every unit is what has caused some of the conflicts between security and pratical use of UDP .
> VPN is a good example , you will find many IPSEC adaptions to allow for NAT traversal based on the issues caused by similar assumptions and use of UDP .
>

If you follow strictly the end-to-end principle, there is no need to
be aware of the boundaries in the middlebox. If you aren't, and you
need to inspect the payload of upper layer protocols, then by all
means, do so. But, at that point you need to be aware of the protocol
being carried, so that you can understanding its framing. Requiring
that each upper layer protocol be send its data in a single UDP packet
is intractable. What if there is more data than the MTU? Thus, there
is a case for a UDP protocol sending more than a single packet per
connection. Consequently, any L4+ middle box worth its salt needs to
understand *how* the data it is inspecting is framed.

> Again using UDP in this way is not strictly "wrong" , but being 2018 I tend to say this is "bad" design to reuse SOURCE port when it has such a low cost to avoid it .
> Saving a few CPU cycles on mobil devices to save their battery might be valid for some users ,
> but you potensially save more if you increased TTL on DNS where this is set at 5 - 120 seconds for the most common service used by todays mobil users .

Consider my point above, where a firewall that needs to understand
when a flow ends could simply track the number of outstanding DNS
queries. There is no need to have a different connection per query,

> On a completely different thought , I have no idea why after 40+ years there are still mainly 6 protocols commonly used (ignoring internal stuff like ISIS/IGP/EGP) for Internet IP ( and still primarly IPv4 ) .
> TCP
> UDP
> ICMP
> GRE
> ESP
> IGMP
>
> It seems like many prefer using a know protocol like UDP for streaming rather then using SCTP or create other protocols more suited for various needs ,
> Instead the seems developers try and push everything TCP cannot be used for into UDP , seemingly for any reason and at any cost without much though about todays security reality .

SCTP does deserve a bit more love. Is this argument basically boiling
down into "Use DNS over SCTP so that firewalls can be happy?" :)