Re: Resolver times out resending with same transaction ID

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/21/23 06:32, Vince Del Vecchio wrote:
Hi all,

I recently observed reverse IPv4 address lookups timing out on a newly
configured host.  (Ubuntu 22.04LTS, systemd 249.11-0ubuntu3.7).  I
tracked the problem to the DVE-2018-0001 mitigation code.

An example:

$ resolvectl query 151.101.1.164
151.101.1.164: resolve call failed: All attempts to contact name
servers or networks failed

tcpdump shows (in relevant part):
  00:00:00.000000 IP 192.168.1.48.35911 > 8.8.8.8.53: 26417+ [1au] PTR?
164.1.101.151.in-addr.arpa. (55)
  00:00:00.021127 IP 8.8.8.8.53 > 192.168.1.48.35911: 26417 NXDomain
0/1/1 (115)
  00:00:00.021252 IP 192.168.1.48.35911 > 8.8.8.8.53: 26417+ PTR?
164.1.101.151.in-addr.arpa. (44)

The first query gets an "NXDOMAIN", which is the correct answer for
this address.

However, NXDOMAIN triggers the DVE-2018-0001 mitigation code to send an
revised query without EDNS OPT (confirmed in debug log).  I **never see
a response to this revised query**.

Frankly, it is wrong from systemd-resolved to try working around clearly broken resolvers. In this case, it delays correct response from well-behaving server. Just because some really broken servers send wrong replies. This should be enabled ONLY by manual configuration, if at all. Every user should know he has broken DNS servers if this (mis)feature helps.

Anyway, it should not require a timeout. If the response had correct name and type in question section and matching transaction id, it is cleary the response to our query. If it insist on those kinds of workarounds, do it right away, not after no response timeout. Better though do that only if requested. NXDOMAIN is a valid response and DNS folks are serious to deliver it only when it means requested name does not exist. Proper way to signal the server does not understand something in the query is only FORMERR response.

It is a shame ResolveUnicastSingleLabel=yes has to be configured manually to avoid some failures on correct names, but such tricks are enabled by default and cannot even be turned off manually. Please correct that!

If there is only a single DNS server, the resolver resends the OPT-less
query after a timeout, and *that* gets an NXDOMAIN which is returned.
However, if there are multiple DNS servers (e.g. 8.8.8.8 8.8.4.4), on
timing out, it sends another query with EDNS to the next server, and
the three-packet sequence repeats several times until it gives up.

Since the server *will* respond to a retransmit after 5s, my guess is
that the server, or maybe something in the network, is dropping close-
in-time requests with the same transaction id.  I tried a few public
DNSs that (surprisingly?) all behaved the same.  I haven't found a
simple way to rule out a firewall, router or my ISP.
Does the re-transmit keep the same source port and transaction id?

Regardless, my thought is that resending a slightly different query
after we did get a response should not use the same transaction id.  I
patched systemd as follows and the problem goes away:

--- a/src/resolve/resolved-dns-transaction.c
+++ b/src/resolve/resolved-dns-transaction.c
@@ -1312,6 +1312,7 @@ void dns_transaction_process_reply(DnsTransaction
*t, DnsPacket *p, bool encrypt
                            FORMAT_DNS_RCODE(DNS_PACKET_RCODE(p)),
                            dns_server_feature_level_to_string(t-
clamp_feature_level_nxdomain));
+ dns_transaction_shuffle_id(t);
                  dns_transaction_retry(t, false /* use the same server
*/);
                  return;
          }


A few questions:

- Does anyone else see this?

- Does this look like a reasonable fix?  Any thoughts on whether the
one other place where dns_transaction_retry(..., false) is called to
retry the same server with a lower feature level (SERVFAIL etc) should
do the same?
Yes, to me it is. Only unmodified retries should keep original transaction ids. If it modifies sent query, it should get a new id for it. It also ensures that the EDNS removal were the thing which helped, not just pure retransmit. I think it should change transaction id every time it got any response. SERVFAIL is a response too.
- Any other issues with the patch?  Or would it be reasonable to (add
comments and) submit a pull request?
I think pull requests are in general a better way to request a code change. Makes commenting easier and linking related issues too.

-Vince Del Vecchio

Just my 2 cents.

Cheers,

Petr

--
Petr Menšík
Software Engineer, RHEL
Red Hat, https://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux