>>> To my small mind, forcing a new DNS lookup in the event of a >>> TCP session failure and restart would be a good thing. >>> >> perhaps, but it won't work reliably as long as there can be more than >> one host associated with a DNS name, nor will it work as long as DNS >> name-to-address mapping is used to distribute load over a set of hosts. >> > > We already have the DNS hooks to distingish services from > hosts. We had them for the last 8 years. > Yes but SRV records weren't really meant to handle this case either. And they actually can make applications less reliable because they introduce a new dependency on DNS (another lookup that can fail, in a different zone and potentially on a different server, another piece of configuration data that can be incorrect.) What we'd really need is a RR type specifically intended to map service names onto instance ID+address pairs, and also a special query type that wasn't defined to return all of the matching RR records, but would instead return a random subset or a subset based on heuristics, and finally an instance ID to address mapping service. But arguably DNS isn't the right place to do that at all - there should instead be a generic referral service at layer 3 or 4. Of course, part of the reason that people started using A records to refer to multiple hosts was that a number of applications "just worked" when they did that. And I remember when people used to object loudly to such things, and insist that a DNS name and a host name had to be the same thing. Anyway, this kind of overloading of A records has been such a widespread practice for so long that I don't see it changing. And it's not as if we came up with a better way of doing things for IPv6 addresses. >> in other words, doing another DNS lookup of the original DNS name only >> looks like a good way to solve the problem if you don't look very deep. >> >> now if you somehow got a host-specific (or narrower) identifier as a >> result of setting up the initial connection (maybe via a TCP option), >> and you had a way to map that host-specific identifer to its current IP >> address (assume for now that you're using DNS, though there are still >> other problems with that) - then you could do a different kind of lookup >> to get the new IP address and use that to do a restart. >> >> even then, it wouldn't help the numerous applications which don't have a >> way to cleanly recover from dropped TCP connections. (remember, TCP >> was supposed to make sure data were retransmitted as necessary and that >> duplicated data were sorted out, provide a clean close, that sort of >> thing. once you expect apps to handle dropped connections they have to >> re-implement TCP functionality at a higher layer.) >> > > Applications need to deal with TCP connections breaking for > all sorts of reasons. Renumbering should be a relatively > infrequent event compared to all the other possible ways a > TCP connection can fail. > Mumble. Seems like the whole point of TCP was to recover from such failures at a lower level. And I remember how people used to say that TCP was better than X.25 VCs (in part) because TCP would recover from temporary network outages that would cause hangups in X.25. I also don't have a lot of faith in "should be", not when I've seen DHCP servers routinely refuse to renew leases after very short times, nor when I've heard people say that a site should be able to renumber every day. I used to try to get people to specify a minimum amount of time that a non-deprecated address should be expected to be valid - say a day. Then application writers and application protocol designers would have an idea about whether they needed a strategy for recovery from a renumbering event, and what kind of strategy they needed. But the only people who seemed to like this idea were application area people. > Until applications deal nicely with the other failure modes, > complaints about renumbering causing problems at the > application level are just noise. > in other words, one design error can be used to justify another? sort of like the blind leading the blind? I see a significant difference between a design flaw in a particular application that cripples that application, and a design flaw in a lower layer that cripples all applications. Keith _______________________________________________ Ietf@xxxxxxxx https://www1.ietf.org/mailman/listinfo/ietf