> On Aug 11, 2023, at 5:36 AM, Lucas Pardue via Datatracker <noreply@xxxxxxxx> wrote: > > Caution: This email originated from outside the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > Reviewer: Lucas Pardue > Review result: Ready with Issues > > I am the assigned Gen-ART reviewer for this draft. The General Area > Review Team (Gen-ART) reviews all IETF documents being processed > by the IESG for the IETF Chair. Please treat these comments just > like any other last call comments. > > For more information, please see the FAQ at > > <https://secure-web.cisco.com/1UZHZEsg_CD0wKCgJum89JtRWBIKuWfAMrOAeNCDx_noxdIVT0xTFtSDKvvkTvjoqt0318tJcX06nwaM58f9XNMDWWilDoqIENqL_gk262YdZle75QHHoW2s2KdRaGCdQkKG8uKUbDRRY655t-OOuxr0Yfd1eJmBdp5KBeJs1-XyEcQI-c_JeFcXJ8taygT-DnCUz-awp_q3J8yJneseERQtJ7GDzNxDcvYbgsJO-fPPCB7ErC401Qq9bP2qWs07AET3l4jK5lmNnyR4yBeDa5NBFgyzdWwC8DOQ9c2t6FPY/https%3A%2F%2Fwiki.ietf.org%2Fen%2Fgroup%2Fgen%2FGenArtFAQ>. > > Document: draft-ietf-dnsop-caching-resolution-failures-?? > Reviewer: Lucas Pardue > Review Date: 2023-08-11 > IETF LC End Date: 2023-08-17 > IESG Telechat date: Not scheduled for a telechat > > Summary: The document was well-written with clear motivation statements and > normative text for addressing the indicated problems Hi Lucas, thanks for the detailed review. > > Major issues: None > > Minor issues: > > * Section 3.1 describes retries and places the normative requirement "A > resolver MUST NOT retry a given query to a server address over a given > transport protocol more than ...". However, the definition of "transport > protocol" is not 100% clear to me, and the terms "transport" and "transport > layer protocol" seem to be used interchangeably through the document. Perhaps > this is clearer to those in the DNS area, but as a transport area person, DNS > over TCP and DNS over TLS both use the same transport protocol. Section 2.3 > would seem to imply that DNS over TCP and DNS over TLS are treated as different. > > I think it would help to better define exactly what "a given transport > protocol" in section 3.1 means. Perhaps that definition already exists > somewhere that can be cited and imported into the terminology section. You’re right that we have not been especially precise when using the word “transport.” The authors did intend for DNS over UDP, over TCP, and over TLS, etc to essentially be treated as separate transports, or separate ways a client can talk to a server. I’m not sure how best to fix this. On one hand, as far as we know, there is currently not a good term that collectively refers to DNS over UDP, TCP, TLS, HTTPS, QUIC, and whatever else may come our way. So maybe we need to define one. I’m hesitant, though, because I’m not sure this document is where such a term should be introduced, and because definitions often turn out to be like cans of worms. Nonetheless, we have taken a stab at it: * DNS Transport: In this document, DNS transport means a protocol used to transport DNS messages between a client and a server. This includes "classic DNS" transports, i.e., DNS-over-UDP and DNS-over-TCP [RFC1034] [RFC7766], as well as newer encrypted DNS transports such as DNS-over-TLS [RFC7858], DNS-over-HTTPS [RFC8484], DNS-over-QUIC [RFC9250], and similar communication of DNS messages using other protocols. NOTE: at the time of this writing not all DNS transports are standardized for all types of servers, but may become standardized in the future. … 3.1. Retries and Timeouts A resolver MUST NOT retry a given query to a server address over a given DNS transport more than twice (i.e., three queries in total) before considering the server address unresponsive over that DNS transport for that query. A resolver MAY retry a given query over a different DNS transport to the same server if it has reason to believe the DNS transport is available for that server and is compatible with the resolver's security policies. > > Nits/editorial comments: > > * In section 1, there exists "section 5" and "section 7" usages that do make it > clear if these are internal or external references. We propose to just remove those section references. > > * I appreciated the text in sections 1.1 and 1.2, dealing with motivation and > related use cases respectively. However, as a generalist reviewer, the most > useful part of Section 1.1 was the first sentence. The remainder of the text in > 1.1 feels like case studies, that while interesting manifestations, are not > pure motivation. As a purely editorial suggestion you can take or leave, > consider modifying the last paragraph of Section 1 to something like > > "Operators of DNS services have known for some time that recursive resolvers > become more aggressive when they experience resolution failures; see Appendix A > for a collection of anecdotes, experiments, and incidents support this claim. > This document updates [RFC2308] to require negative caching of DNS resolution > failures, which can help to mitigate the operational problems failures might > generate. Examples of resolution failures are provided in Section 2. Related > work is described in Appendix B." > > then move the text from sections 1.1 and 1.2 in appendix A and appendix B. That is an interesting suggestion. Among discussion with my coauthors we have a slight preference to leave it as-is, but would also like to take advice on this from the RFC editor. > > * TOC - "Conditions That Lead To DNS Resolution Failures" vs "Requirements for > Caching Resolution Failures". Presumably the same thing, so consistency might > help I’m not sure I understand this comment. Can you explain further what you mean? > > * Section 3.2 - regarding the 1 second minimum requirement, the text that > follows says "Resolvers MAY cache different types of resolution failures for > different (i.e, longer) amounts of time." and then later "Consistent with > [RFC2308], resolution failures MUST NOT be cached for longer than 5 minutes.". > These statements are all logically consistent but could be made simpler with > some editorial work. For example, something like > > "Resolvers MUST cache resolution failures for at least 1 second. Resolvers MAY > cache failures for a longer time, up to a maximum of 5 minutes (per the > requirements of [RFC2308]). Resolvers MAY cache different types of failures > using different time periods within this range." I see what you’re saying. We propose to move the maximim caching time up and split that paragraph into two, as follows: Resolvers MUST cache resolution failures for at least 1 second. Resolvers MAY cache different types of resolution failures for different (i.e., longer) amounts of time. Consistent with [RFC2308], resolution failures MUST NOT be cached for longer than 5 minutes. The minimum cache duration SHOULD be configurable by the operator. A longer cache duration for resolution failures will reduce the processing burden from repeated queries, but may also increase the time to recover from transitory issues. DW -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call