Hi Bernard, Thanks for your comments. I have a few responses and a few clarifying questions inline with [JH]. On 2019-11-30, 17:42, "Bernard Aboba via Datatracker" <noreply@xxxxxxxx> wrote: > This draft is ready for publication from a transport point of view, with > the exception of a few (relatively minor) issues: > > Section 2.5.4.1 > > " The RECOMMENDED timeout is a random value in the range > [initial_timeout, MIN(initial_timeout * 2^retry_count, > maximum_timeout)], with a RECOMMENDED initial_timeout of 4 seconds > and a RECOMMENDED maximum_timeout of 120 seconds. > " > > [BA] The draft provides a justification for the initial_timeout value > of 4 seconds, but not for the maximum_timeout value of 120 seconds, > which seems somewhat high. It is my suspicion that the value is set > this high to allow for robustness in dealing with potential routing > transients. It would be helpful to state the reasoning. [JH] I can add the text from Section 5.2.3.4.3 of RFC 7450 (referenced from the next paragraph), which contains a similar equation with that justification for the 120 second timer: https://tools.ietf.org/html/rfc7450#section-5.2.3.4.3 " a RECOMMENDED maximum_timeout of 120 seconds (which is the recommended minimum NAT mapping timeout described in [RFC4787]). " Will that address this concern? Note the same maximum appears in section 2.7, and the reasoning is similar, since this is all part of the AMT discovery process, and thus subject to similar reasoning as the discovery process in RFC 7450. Do you think the same text is necessary in both places? (Or necessary at all, given the reference to a very similar equation in the following paragraph?) I've provisionally added it to both spots in my local copy, but please let me know if you think it should be different. > Section 2.5.4.2 > > " In some gateway deployments, it is also feasible to monitor the > health of traffic flows through the gateway, for example by detecting > the rate of packet loss by communicating out of band with receivers, > or monitoring the packets of known protocols with sequence numbers. > Where feasible, it's encouraged for gateways to use such traffic > health information to trigger a restart of the discovery process > during event #3 (before sending a new Request message). > > However, to avoid synchronized rediscovery by many gateways > simultaneously after a transient network event upstream of a relay > results in many receivers detecting poor flow health at the same > time, it's recommended to add a random delay before restarting the > discovery process in this case. > > The span of the random portion of the delay should be no less than 10 > seconds by default, but may be administratively configured to support > different performance requirements." > > [BA] There is good reason to be concerned about causing synchronized > rediscovery as a result of a transient network event, if "poor flow health" > is diagnosed too readily. As a result it would be useful to have more > specific advice on the definition of "poor flow health" as well as > how to calculate the "random delay". > > My assumption is that we are talking about *major* and *sustained* > loss here (e.g. a period larger than most routing transients), as well > as a *substantial* delay (to avoid instability). [JH] I agree with this in principle and tried to fix it in a rev several versions ago, but I ended up deciding to leave it this way, somewhat reluctantly. I think the right answer depends too strongly on the specifics of the situation to provide much in the way of concrete advice, at least that I could think of, beyond a rough pointer to the problem. I think even "major and sustained" might be too situational, because I think it depends on the network and the service (for example, even minor and sustained would in some cases be worth changing relays, especially if there's a history suggesting something better is expected). I agree that the text is a bit weak here, and that suggests it should be possible to improve, but I never was happy with any of the ideas I came up with--nothing I could find seemed both generic enough to be generally applicable and specific enough to be useful. If you think it's helpful, I can add something like "The specifics of the health monitoring logic are out of scope for this document.", or I'd be happy to accept text here if anyone has better suggestions, but nothing I came up with seemed to me like it made any material improvement, and that being the case concluded that shorter is better. (I also thought it might be best to just cut this section, but decided against that because I thought it better to acknowledge and encourage this where it's feasible. Maybe that's a mistake? My not-very-firm judgement call was that leaving this in is better than nothing, but I'll take advice here.) Anyway, I haven't made any changes to my local copy yet to address this point. Hopefully this response lays out my current position. Please let me know if you have any further comments about this. I'd be happy to see it improve and grateful for suggestions on how to do so, but am willing to ship it as it stands, absent a more specific suggestion or a better understanding of the problem that needs solving in the text. > Concerns unrelated to Transport > > Security > > Section 6.2 > > "There must be a trust relationship between the end consumer of this > resource record and the DNS server. This relationship may be end-to- > end DNSSEC validation, a TSIG [RFC2845] or SIG(0) [RFC2931] channel > to another secure source, a secure local channel on the host, DNS > over TLS [RFC7858] or HTTPS [RFC8484], or some other secure > mechanism." > > [BA] This paragraph is mixing e2e security mechanisms (DNSSEC) with > mechanisms such as DoT and DoH. The threats addressed by each mechanism > are different (e.g. RR modification versus snooping) so it would be helpful > to be clear about what the threat model is. Is there a privacy concern > relating to unauthorized snooping of AMTRELAY RRs? Or is the issue more > modification of the RRs? The issue is modification of the RRs. (I assume an adversary who can observe the DNS request and poses a privacy threat is also likely positioned to observe the AMT traffic and its embedded subscriptions, which is already a worse privacy problem than the source-specific discovery request and is a pre-existing issue when using AMT, not added by this doc.) The next paragraph in the same section (I thought) explained the threat model that this section was trying to address: " If an AMT gateway accepts a maliciously crafted AMTRELAY record, the result could be a Denial of Service, or receivers processing multicast traffic from a source under the attacker's control." Do you have a suggestion for improving on that explanation? I'm not sure where this fell short. Do I need to spell out more about the possible consequences of accepting traffic from a source under an attacker's control? > Overall utility > > [BA] It is not clear to me why the AMTRELAY RR is needed, given that > Section 2.3.1 makes it clear that querying this record is a last > resort: > > ... <cut: quote of 5 preconditions from 2.3.1> ... > > In particular, DNS-SD RRs can easily be added with DNS service > providers, while this is not necessarily the case for a new > AMTRELAY RR. So are there really situations in which it was not > feasible to add DNS-SD RRs, but using the AMTRELAY RR is more > convenient/easier to deploy? [JH] I believe this is the typical case today, and is the core motivation for writing this doc in the first place. I'm a bit troubled that the rest of the doc didn't get this point across, because I believed it to be a central theme of several of the existing sections, most particularly sections 2.1 and 2.2, as well as section 1. The core issue is that the sending networks (for example those listed in section 3.2) know about provisioned AMT relays that can forward their traffic, but the receiving networks (for example those listed in section 3.1) don't know about those relays without a new discovery mechanism (currently provided only by this new AMTRELAY record). In particular: the DNS-SD service is not source-specific, and although it should be preferred where available for the reasons given in section 2.3.1, any network that can supply a valid relay via DNS-SD (one that can receive and forward multicast traffic from the given source) either has native multicast connectivity to the source (like perhaps you could do if the receive network was directly connected to the send network, rather than only reachable across the internet), or has an upstream AMT ingest point that relies on the AMTRELAY discovery (which today would be almost all networks that are not walled gardens, with the exception of i2). I had thought this explanation was more or less covered by section 2.2. One day, I do hope the AMTRELAY record can be abandoned because there will be a native multicast backbone available everywhere. However, as a transition technology until that time, some mechanism for automatically connecting the multicast-enabled receiver islands to the multicast-enabled sender islands in a source-dependent way is necessary, which is what this document is trying to define, and which has previously been missing. I hope that clarifies things, and please let me know if you can suggest any place to add text that would have made this more clear on the first reading. Thanks and regards, Jake -- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call