Thanks very much Wesley for the reviews! Comments in-line... > From: Wesley Eddy, April 2, 2019 1:04 PM > > Reviewer: Wesley Eddy > Review result: Almost Ready > > This document has been reviewed as part of the transport area review team's > ongoing effort to review key IETF documents. These comments were written > primarily for the transport area directors, but are copied to the document's > authors and WG to allow them to address any issues raised and also to the IETF > discussion list for information. > > When done at the time of IETF Last Call, the authors should consider this > review as part of the last-call comments they receive. Please always CC tsv- > art@xxxxxxxx if you reply to or forward this review. > > The document includes a way intended to ask for a particular DiffServ Code > Point (DSCP) value to be used by a publisher. This is missing some context. > Why would a subscriber do this? Some of the QoS context is provided from RFC-7923. This RFC provides requirements/context for a YANG based subscription service.
As a quick summary of this RFC, the subscription service first and foremost establishes a new mechanism for router integration with network Controllers and Network Management Systems (NMS). Between these devices, a common network operations
context might be available. In fact, the same operational organizations might operate all these devices. Based on this, there should be some shared knowledge of the underlying QoS context. In communicating between Controllers/NMS and a network element, some subscriptions might carry information with a higher business precedence than others. For example, providing information about a line card outage should take precedence
over a set of counters. These precedence needs must be respected at a variety of congestion points. These include during the dequeuing of information from the publisher, as well as during network transit.
Driven by this, RFC-7923 Section 4.2.6.8 provides guidance on some subscription QoS parameters: “The Subscription Service SHOULD support the relative prioritization of subscriptions so that the dequeuing and discarding of push updates can consider this if there is insufficient bandwidth between the Publisher and the Receiver.” The need for DSCP is well known, and has also been recognized by the OpenConfig Telemetry effort. And as a result, the DSCP object from this draft was also picked up by the OpenConfig-telemetry.yang model. For more info on that,
see line 583 of: https://github.com/openconfig/public/blob/master/release/models/telemetry/openconfig-telemetry.yang > Is it asking for DSCP values that it assumes > are honored and not bleached or altered all the way between it and the > publisher? Are there conditions normal for the use of this protocol where that > assumption usually holds? The telemetry implementations seen so far are within the domain/provenance of a single operator. As a result, there haven't been requirements which have driven a mapping or negotiation between a requested quality of service, and one
delivered by the subscription service. (We talked about this early on, but people thought that would be unnecessary complexity. Plus we could always layer such a capability in during the future if necessary.) > By asking for a particular DSCP to be set, the subscriber is maybe attempting to > optimize the behavior during some type of overload, where they really want > some notifications quickly, but others aren't as important, and may be fine to > drop and have retransmitted by TCP, etc. This all seems to be implicit though, > with no deep discussion of what the protocol is attempting to enable with this > option or how it would be productively used. We had several early discussions on whether this draft should talk about various business objectives of protocol selections. And in many cases this is done. But as the draft documents was already quite large, we have been hoping that
other documents can provide help/content on QoS aspects, rather than going into them deeply here. And there are documents which do help. For example implementers do have documents like RFC-7923 and
http://www.openconfig.net/projects/telemetry/. There are also vendor documents which can provide guidance.
> It seems to be considered fatal if a publisher can't write a requested DSCP > value. The requests fail with "dscp-unavailable". This seems too drastic, since > the DSCP is anyways advisory to nodes on the path, and may be altered > anyways. I'm not sure why this would be considered a fatal issue to the > subscription. A design goal is that a publisher exactly meets the terms requested by the subscription. If a publisher cannot meet the request, the subscription is not established. So if a DSCP is not available, we have the subscriber ask for one
that is available. DSCP not available is just one of many reasons why a subscription may be rejected. And each rejected subscription request can effectively be treated as a negotiation. So look at error messages as a way to provide feedback for a better
subsequent subscription request. > The feature description for "dscp" mischaracterizes it as a pure priority > mechanism, rather than a more general indication of class of service treatment: > "This feature indicates a publisher supports the placement of suggested > prioritization levels for network transport within notification messages." It > could be more correct to say something like "This feature indicates that a > publisher supports the ability to set the DiffServ Code Point (DSCP) value in > outgoing packets." This makes sense. I have updated the text accordingly. > The same comment is applicable in the dscp leaf description on page 45. It is > mischaracterized as a pure priority mechanism, which is not how the IETF has > defined DSCP. I have tweaked the leaf text to: "The desired network DiffServ Code Point (DSCP) value. This is
the DSCP value to be set on notification messages
encapsulating the results of the subscription. This DCP value
is shared for all receivers of a given subscription." Does this meet your objective? > The weighting feature seems to need a little bit more work. It allows values > between 0 and 255, and there is some description that bandwidth is supposed > to be allocated somehow proportional to the weighting, but it's not really clear > how this would be done or that it makes sense. Is it assumed that the > publisher has some fixed bandwidth limit that it's trying to stay within, and > that it can choose messages from streams (based on their size, frequency, etc) > in some way to honor the weights? What is the method to compute the > proportions? Is it purely linear? What if there is no contention? What if there > is no inherrent bandwidth limit known to the publisher (since generally there > probably wouldn't be)? The intention and detail of what this is trying to > achieve seems to need to be worked through a bit more to avoid just having a > complex feature that might not really achieve much real result, depending on > how its implemented. Initially the draft text made some explicit linkages here to corresponding HTTP2 capabilities. The objective is to have the priority mechanism *exactly* match to the weight mechanisms used with HTTP2. Below (or attached) is a picture
from IETF #96 showing the intended behavior: This complex feature really shouldn’t be very hard in practice where HTTP2 transport and code exists.
Also worth noting is that the OpenConfig telemetry effort uses GRPC. And underneath GRPC is HTTP2. So parallel standardization mechanisms are positioned here as well. > Is there any thought on phase effects, with regard to events that have many > subscribers, causing a large number of simultaneous writes on different > streams? Overall, there could be low bandwidth utilization for notifications, > but then sudden spikes when there is a coupling in the notifications going out > on multiple streams. This could lead to some connections seeing losses and > others not, for instance, and there might be a reason to try to dither the writes > or have other logic encouraged to handle surges of outgoing notifications. > Since the underlying transports are TCP-based, losses will be recovered > eventually, but there could be latency created in the meantime. This might > motivate some of the QoS features that are cursorily discussed, but its not clear > how they would be effective. Yes, this is very much a consideration. And where this is a worry about these issues, we wanted to allow the adoption of transports capable of handling these issues. But this is a hard technology problem space. And we didn’t want
to develop new complex code. And we didn’t want to develop untested QoS paradigms. Especially when really good industry solutions already exist. As a result, elements of the QoS model underlying HTTP2 seemed reasonable to adopt.
Thanks again for the deep consideration of QoS implications here, |