On 08/11/2023 09:54, Bob Briscoe wrote:
Susan,
Thank you for your review. See [BB]
On 28/10/2023 23:17, Susan Hares via Datatracker wrote:
Reviewer: Susan Hares Review result: Ready with Issues I am the assigned Gen-ART reviewer for this draft. The General Area Review Team (Gen-ART) reviews all IETF documents being processed by the IESG for the IETF Chair. Please treat these comments just like any other last-call comments. For more information, please see the FAQ at <https://wiki.ietf.org/en/group/gen/GenArtFAQ>. Document: draft-ietf-tsvwg-ecn-encap-guidelines-?? Reviewer: Susan Hares Review Date: 2023-10-28 IETF LC End Date: 2023-11-02 IESG Telechat date: Not scheduled for a telechat Summary: The document summarizes decades of work on congestion work in IEEE 802.1 and IETF. It provides a set of good guidelines/recommendations for designers of Layer-2 (L2) or L2/L3 shim layer. The authors (John Kaippallimalil and Bob Briscoe) and the original author/ now-contributor (Pat Thaler) should be commended for their work. The text is generally readable with relatively few English and editorial issues. However, the authors would be wise to fix the editorial issues prior to sending it to the RFC editor since phrasing needs to be precise. Major issues: none. Minor issues: 1. Have the authors considered the SR-routing pathways with tunnels in this draft? If so, the authors might add a side note in the document.
[BB] Not specifically. The idea is to provide guidelines for any protocol designers who aim to add ECN. Might there be something specific in SR that would make this generic guidance inapplicable? Other than not having protocol space, which is obviously not something that guidelines can solve (similarly for all IEEE802 protocols).
[BB]2. Has this document been circulated by the IETF liaisons to IEEE 802.1, MEF, and 3GPP? 3.
- IEEE802.1: no current relevant work (Pat Thaler was a deliberate choice of co-author at the time, given she was the IEEE802 liaison, and worked on 802.1Q congestion signalling - separate control plane messages)
- Here's the liaison request: https://datatracker.ietf.org/liaison/1364/ (I can't find the response)
- 3GPP: responded with 6 relevant WGs and 20 relevant TRs, which were subsequently analysed and a formal response returned.
- Formal liaison response: https://datatracker.ietf.org/liaison/1499/
- Or, for summary, see slides 3-8 from the time: https://www.ietf.org/proceedings/95/slides/slides-95-tsvwg-11.pdf
- MEF: No liaison. The rationale was to focus on SDOs that might define ECN-like extensions to their lower layer protocols. I think the MEF is unlikely to define protocols at that level; it is more about using protocols defined in other bodies like IEEE802. IYO, was this a bad judgement call?
Are you really sure your security and manageability sections are complete?
[BB]
- Security:
- yes definitely complete
- Manageability:
- We didn't write a separate manageability section, but I believe we covered the various aspects well within the text, but I'll let you decide by quickly listing them:
- Config:
- ECN has been designed to have zero to one alternatives
- decap has no choices
- encap is fairly minimal wrt config, the main aspect being config of encaps where it doesn't know whether the egress supports ECN decap because it doesn't negotiate with it (which is heavily covered in the draft).
- in numerous places, the draft distinguishes approaches that will work in managed environments from those that are more appropriate in non-managed plug-and-play scenarios.
- Monitoring:
- The guidelines on encap justify themselves based on being able to monitor congestion over the whole path from the origin load source, but also allow the congestion introduced over the scope of the tunnel, link or subnet to be measured.
- Anomaly detection:
- propagating ECN over encaps doesn't change the usefulness or otherwise of ECN in detecting anomalies, so there's no mention of this.
- Deployment & Coexistence:
- A large part of the draft is about addressing the problem of incrementally deploying ECN support, in particular where the egress of a subnet or tunnel does not support ECN decap.
- Scaling:
- The good scaling properties of ECN itself are not changed by encaps, other than whether the propagation process can be implemented as line rate scales. The design of the ECN protocol determines that, so there's nothing more this draft can say about that than, you will have to work out how to implement efficiently (which goes without saying).
RFC7713 predates large deployment of SR routing.
[BB] I was rather surprised to see these two mentioned in the same sentence under security considerations, as if they might offer similar functions.
However, all I knew about segment routing was what Rob Shakir had described to me many years ago, so I've been reading up. I still don't see anything in SR that could help with assurance of e2e integrity of congestion signals. The problem addressed by RFC7713 is how to ensure that everything around the whole end-to-end congestion control loop is reporting congestion to the next device honestly. Just as each network node in isolation can lie about how congested it is, so can a segment, or an SR domain. So I can't see how SR could make any contribution to a solution to this problem.
But if I'm missing some key insight here, pls enlighten me.
BTW, altho RFC7713 is cited here, it's only to say that an integrity solution exists if its needed. It's not been deployed widely, not least because the problem hasn't arisen. Hence the draft describes these bullets as "experimental or proposed".
Nits/editorial comments: Formatting in the pdf seems to be problematic. I am not commenting on this point since html form does not have hte same problem.
[BB] I've just looked through the PDF - what problem should I look for? I can't see anything wrong.
Nits in Editing: Section 2 Old/Not-ECN-PDU: A PDU at the IP layer or below that is part of a congestion control feedback-loop within which at least one node necessary to propagate any explicit congestion notification signals back to the Load Regulator is not capable of doing that propagation./ New:/Not-ECN-PDU: A PDU at the IP layer or below that is part of a congestion control feedback-loop within which at least one node necessary to propagate any explicit congestion notification signals back to the Load Regulator this PDU is not capable of doing that propagation./
[BB] This isn't what it meant, which implies that the sentence might be on the edge of parseability.
Whether it's an ECN-PDU or a Not-ECN-PDU is defined as a property of the control loop that the PDU is bound to (by its addressing or labelling), not necessariiy a direct property of the PDU itself. This was meant to be a way of getting round the problem of a packet adopting multiple L2 headers of multiple different L2 protocols as it traverses the path. Then each PDU has to have some way of indicating whether it is an ECN-PDU or not. In some protocols that will be self-described. But in others, it might be described indirectly in the control plane. I think I'm going to need to explicitly say all this in the terminology explanation.
Normally, I'd give proposed new text, but I'm just heading for my flight, so I'll press sent now.
[BB2] Here's proposed text:
A PDU at the IP layer or below that is
part of a congestion control feedback loop that is not capable of
propagating explicit congestion notification signals back to the Load
Regulator, because at least one of the nodes necessary to propagate
the signals is incapable of doing that propagation. Note that this
definition is a property of the feedback-loop, not necessarily of the
PDU itself, because in some protocols the PDU will self-describe the
property, but in others the property might be carried in a separate
control-plane context that is somehow bound to the PDU.
Bob
[BTW, I've also noticed there are places where it says a congested buffer marks or drops packets without making it clear that it only marks or drops a proportion, not all packets. Apparently this caused problems with the whole attempt to use ECN for Voice over LTE (VoLTE) in 3GPP, where they defined ECN as either 100% on or 100% off, because it didn't say otherwise anywhere in RFC3168 (!). So I'd better fix this.]
Section 3 Text:/ The router forwards the marked L3 header into subnet 2, and when it adds a new L2 header it copies the L3 marking into the L2 header as well, as shown by the 'C's in both layers (assuming the technology of subnet 2 also supports explicit congestion marking)./ Question - did you mean subnet b instead of subnet 2 (per figure 1)
[BB] Yes. Also caught by another reviewer - strange no-one in the WG noticed this.
Section 4.3 Text:/ This ensures that bulk congestion monitoring of outer headers (e.g. by a network management node monitoring ECN in passing frames) will measure congestion accumulated along the whole upstream path — since the Load Regulator not just since the ingress of the subnet. / The portion of this sentence that has me confused is "since the Load Regulator not just since the ingress of the subnet.". I'm not really sure what you are trying to say - so I have no suggested new text.
[BB]
will measure congestion accumulated along the whole upstream path — starting from the Load Regulator not just starting from the ingress of the subnet.
Would that make it comprehensible?
Bob
-- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/
-- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/
-- last-call mailing list last-call@xxxxxxxx https://www.ietf.org/mailman/listinfo/last-call