SV: Conntrack insertion race conditions -- any workarounds?

André Paulsberg-Csibi (IBM Consultant) <Andre.Paulsberg-Csibi@xxxxxxxx> · Fri, 7 Sep 2018 15:33:21 +0000

> I think we're getting hung up on semantics here. From a layering perspective, what libc is doing is perfectly fine at the UDP layer  it is simply sending two messages, which are framed using DNS. 

You are probably correct , I guess in 10-20 years we will see if "our" concerns are just paranoia and/or semantics .

> I'm not sure what security flaw we're concerned with here...

We are looking for such security concerns and similar issues https://duo.com/blog/the-great-dns-vulnerability-of-2008-by-dan-kaminsky

>> I am pretty sure this statement is not entirely correct , first of picking a random number from 1025-65535 as your source port does not give you a 50% of hitting any random 300 ports currently in use .
>
> Maybe that doesn't apply here, but my gut tells me it does.

It does apply , but the numbers are not finite to the extent used here ( 365 or 366  ) and DNS source port are just partially random .
I have not tried to applying the same math to having a group of people being born on the same Month & day & hour & minute & second & microsecond & "5 digit serial number between 01025-65535" . 
And not similarly static since being born is a permanent state , and the SOURCE PORT can come and go in an a partial random way - even if you could argue that at any "nanosecond" there is a finite number of 65535 .

> But this is the case we are running into. We have a client which is making back to back requests to the same DNS server within microseconds.
> Does it really make sense for that client to spend cycles opening a new socket for each request?
> Sure, if it has closed the socket because it thinks it doesn't need it any longer, but when it's doing happy eyeballs, it *knows* it needs to send two requests, so why not reuse the socket?
>  What if it needs to send 100 requests?
> Should it use a separate connection for each of them? That sounds like a great way to kill firewalls or other statefull middleboxes.

Today libc changes SOURCE PORT for most request except like the case you are experiencing , where it tries to save "resources" by query A and AAAA In something like a UDP stream .
But practically never uses anything like UDP stream for anything else ( mail MX and A record was mentioned ) , and yes you save resources for the clients , and with large number of clients using same Firewall even the FireWall  can save resources at the cost of security .
And as soon as you argue that L4+ Firewall should instead use resources to inspect and track more details about each packet on the "+" side , that saved resources are lost in abundance .

IF you are making a design for Service Provider delivery , you solve this like the example with using WEB PROXY ( in that case libc will only resolve the DNS name of the PROXY in the "morning" for WEB requests , the rest will be done by the WEB PROXY.)
If done correctly several 100-thusands clients can be on the same infrastructure , since most used connection they generate is via PROXY which in the example mitigated 90% of their queries to DNS for WEB request .

> I don't quite follow this. Are you saying that if something like 1000 clients in the internet send HTTP requests to the web proxy, which in turns makes 1000 DNS requests, 100 of which actually need to go through the firewall to a DNS server. 
> So, if each request used a different source port, we'd consume 100 new ports per second? Wouldn't that be an argument for using the same source port the whole time?

In that case you mitigated 90% as said above for WEB , doubling the request for A and AAAA records you still up ending up with at least 80% of DNS being mitigated .
Using the same port always would only by a larger security issue , and yes it is a cost/benefit issue so you are free to choose what you prefer .
You could also go completely stateless with some VMWARE NSX type ACL firewall , or just go without firewall ( which would save resources as you would either need less resource for FireWall(s) or no resources if removed entirely )

> On the other hand, if the firewall needed to clean up state for each single DNS request, you'd double the work for each request made by the clients of libc. 
> The vast majority of work in flow-aware middleboxes is in state setup and teardown. You're basically doubling the amount of that for the most common DNS query pattern made by Linux systems...
> of course, we could be clever, and track the number of outstanding requests,  only cleaning up the state when the final request comes in or it times out. This would solve both problems, but it does not solve the problem we're facing in conntrack, which is that the two requests come into the system almost in parallel, triggering that race condition. > We would never have had the chance to clean up the state, because the first request never left the system, meaning there was no chance for the DNS server to respond to it.

Nope that is either solved in the new version Michal described , or by avoiding reusing source port unless it is a genuine stream .

> If you follow strictly the end-to-end principle, there is no need to be aware of the boundaries in the middlebox. If you aren't, and you need to inspect the payload of upper layer protocols, then by all means, do so. But, at that point you need to be aware of the protocol being carried, so that you can understanding its framing. Requiring that each > upper layer protocol be send its data in a single UDP packet is intractable. What if there is more data than the MTU? Thus, there is a case for a UDP protocol sending more than a single packet per connection. Consequently, any L4+ middle box worth its salt needs to understand *how* the data it is inspecting is framed.

This statement made no sense to me , it seems mutually exclusive , as to do the inspection you "expect" the L4+would require inspection at a level that would need to identify the boundaries .
Also strict end-to-end principle would in theory also make the endpoints responsible for their own communication security ,  which would lower the need for any L4+ security for such communication .

> Consider my point above, where a firewall that needs to understand when a flow ends could simply track the number of outstanding DNS queries. There is no need to have a different connection per query,

If you wanna pay 1+X cpu cycles for every 1 cpu you save on client side , then yes you can add more inspection to FireWall(s) on the paths

> SCTP does deserve a bit more love. Is this argument basically boiling down into "Use DNS over SCTP so that firewalls can be happy?" :)

Nah , that was a side point for streams ( not DNS ) which really isn't a stream service ( and for which I said it would be better to just randomize
My point boils down to why send 2 DNS queries using the same source port to save a small amount of CPU on the clients when adding another unique udp source has a low cost 
Even adding some more cpu load creating a extra state entry for any FireWall(s) on the way , specially against the argument of deeper packet inspection on FireWall(s) which is currently costly compared to being able to close 

Best regards
André Paulsberg-Csibi
Senior Network Engineer 
IBM Services AS

Sensitivity: Internal