Re: Call for action vs. lost opportunity (Was: Re: Renumbering)

Keith Moore <moore@xxxxxxxxxx> · Thu, 13 Sep 2007 23:34:48 -0400

>>> To my small mind, forcing a new DNS lookup in the event of a
>>> TCP session failure and restart would be a good thing.
>>>       
>> perhaps, but it won't work reliably as long as there can be more than
>> one host associated with a DNS name, nor will it work as long as DNS
>> name-to-address mapping is used to distribute load over a set of hosts.
>>     
>
> 	We already have the DNS hooks to distingish services from
> 	hosts.  We had them for the last 8 years.
>   
Yes but SRV records weren't really meant to handle this case either. 
And they actually can make applications less reliable because they
introduce a new dependency on DNS (another lookup that can fail, in a
different zone and potentially on a different server, another piece of
configuration data that can be incorrect.)  What we'd really need is a
RR type specifically intended to map service names onto instance
ID+address pairs, and also a special query type that wasn't defined to
return all of the matching RR records, but would instead return a random
subset or a subset based on heuristics, and finally an instance ID to
address mapping service.  But arguably DNS isn't the right place to do
that at all - there should instead be a generic referral service at
layer 3 or 4.

Of course, part of the reason that people started using A records to
refer to multiple hosts was that a number of applications "just worked"
when they did that.  And I remember when people used to object loudly to
such things, and insist that a DNS name and a host name had to be the
same thing.  Anyway, this kind of overloading of A records has been such
a widespread practice for so long that I don't see it changing.  And
it's not as if we came up with a better way of doing things for IPv6
addresses.
>> in other words, doing another DNS lookup of the original DNS name only
>> looks like a good way to solve the problem if you don't look very deep.
>>  
>> now if you somehow got a host-specific (or narrower) identifier as a
>> result of setting up the initial connection (maybe via a TCP option),
>> and you had a way to map that host-specific identifer to its current IP
>> address (assume for now that you're using DNS, though there are still
>> other problems with that) - then you could do a different kind of lookup
>> to get the new IP address and use that to do a restart.
>>
>> even then, it wouldn't help the numerous applications which don't have a
>> way to cleanly recover from dropped TCP connections.  (remember,  TCP
>> was supposed to make sure data were retransmitted as necessary and that
>> duplicated data were sorted out, provide a clean close, that sort of
>> thing.   once you expect apps to handle dropped connections they have to
>> re-implement TCP functionality at a higher layer.)
>>     
>
> 	Applications need to deal with TCP connections breaking for
> 	all sorts of reasons.  Renumbering should be a relatively
> 	infrequent event compared to all the other possible ways a
> 	TCP connection can fail.
>   
Mumble.  Seems like the whole point of TCP was to recover from such
failures at a lower level.  And I remember how people used to say that
TCP was better than X.25 VCs (in part) because TCP would recover from
temporary network outages that would cause hangups in X.25.

I also don't have a lot of faith in "should be", not when I've seen DHCP
servers routinely refuse to renew leases after very short times, nor
when I've heard people say that a site should be able to renumber every
day.  

I used to try to get people to specify a minimum amount of time that a
non-deprecated address should be expected to be valid - say a day.  Then
application writers and application protocol designers would have an
idea about whether they needed a strategy for recovery from a
renumbering event, and what kind of strategy they needed.  But the only
people who seemed to like this idea were application area people. 
> 	Until applications deal nicely with the other failure modes,
> 	complaints about renumbering causing problems at the
> 	application level are just noise.
>   
in other words, one design error can be used to justify another?  sort
of like the blind leading the blind?

I see a significant difference between a design flaw in a particular
application that cripples that application, and a design flaw in a lower
layer that cripples all applications.

Keith

_______________________________________________

Ietf@xxxxxxxx
https://www1.ietf.org/mailman/listinfo/ietf