Re: [Last-Call] Opsdir last call review of draft-ietf-quic-manageability-14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Al,

thanks for your review. Glad to hear that you find the draft useful. I think your input is very valuable for this draft as you are part of the intended audience, however, not sure how/if to address many of your comments. But let me try provide some explanations. Please see inline marked with [MK].

Mirja


On 05.02.22, 23:11, "QUIC on behalf of Al Morton via Datatracker" <quic-bounces@xxxxxxxx on behalf of noreply@xxxxxxxx> wrote:

    Reviewer: Al Morton
    Review result: Has Issues

    Hi Mirja and Brian,

    This is the OPSDIR review of

                  Manageability of the QUIC Transport Protocol
                        draft-ietf-quic-manageability-14

    Thanks for preparing this draft. I think it will succeed to inform your
    intended audience. I found that it filled-in some gaps for me.

    Likewise, I found some areas where I make suggestions or comment. I would be
    happy to discuss any of these areas further. Your familarity with
    related-work/papers-to-cite could minimize your efforts in response. Protocols
    are evolving, and so are Networks.

    Editorially, the doc is in great shape, except for the use of different terms
    for when TCP takes-over: fail over, fallback, and fail-over are all used.

[MK] This we were able to address. Thanks for pointing this out. See PR here if you want: https://github.com/quicwg/ops-drafts/pull/457/files 

    regards,
    Al

    Abstract

       This document discusses manageability of the QUIC transport protocol,
       focusing on the implications of QUIC's design and wire image on
       network operations involving QUIC traffic.  It is intended as a
       "user's manual" for the wire image, providing guidance for network
       operators and equipment vendors who rely on the use of transport-
       aware network functions.

    ...

    2.4.  The QUIC Handshake

    ...

       Client                                    Server
         |                                          |
         +----Client Initial----------------------->|
         +----(zero or more 0RTT)------------------>|
         |                                          |
         |<-----------------------Server Initial----+
         |<---------(1RTT encrypted data starts)----+
         |                                          |
         +----Client Completion-------------------->|
         +----(1RTT encrypted data starts)--------->|
         |                                          |
         |<--------------------Server Completion----+
         |                                          |

       Figure 1: General communication pattern visible in the QUIC handshake

       As shown here, the client can send 0-RTT data as soon as it has sent
       its Client Hello, and the server can send 1-RTT data as soon as it
       has sent its Server Hello.  The Client Completion flight contains at
       least one Handshake packet and could also include an Initial packet.
       QUIC packets in separate contexts during the handshake can be
       coalesced (see Section 2.2) in order to reduce the number of UDP
       datagrams sent during the handshake.  QUIC packets can be lost and
       reordered, so packets within a flight might not be sent close in
       time, though the sequence of the flights will not change, because one
       flight depends upon the peer's previous flight.

    [acm]
    It's great to add some Not-Sunny-Day info in the description, thanks!
    But can you add a little more?  For example:
    Is it possible that network reordering can cause the handshake to fail?
    What rerodering extent (yes, that's a metric) would be required to cause
    failure or unnecessary retransmission? Lost packets would result in time-outs
    and retransmission, so what are the default time-outs? Is there a paper where
    some/all of the above have been investigated, that you could reference to save
    some work?

[MK] I don't think I have a good answer to these questions. However, I don't think these questions are necessarily QUIC specific. Reordering in itself is not a problem for the QUIC handshake, however, if some packets are also delayed a lot and therefore detected as loss, it might get the handshake to fail. How long you keep state to wait for delayed packets in mostly implementation specific, however, as I said this is also not a unique problem to the QUIC handshake.

[MK] Further, this section in the draft really only explains what you have to expect when you see QUIC traffic as a passive obverser. The reason why we mention loss and reorder in this section, is really just to say, if you e.g. want to detect the QUIC handshake as a passive on-path observer, you should not expect it to always look exactly like this (as here might be reordering or loss earlier in the network between the client and your observation point).

    ...

    2.8.  Version Negotiation and Greasing

    ...
       QUIC is expected to evolve rapidly, so new versions, both
       experimental and IETF standard versions, will be deployed on the
       Internet more often than with traditional Internet- and transport-
       layer protocols.  Using a particular version number to recognize
       valid QUIC traffic is likely to persistently miss a fraction of QUIC
       flows and completely fail in the near future, and is therefore not
       recommended.
    [acm] Where "valid traffic" is the focus, I agree, let it flow.
    But the Operator's focus may instead be "admissible traffic", where
    experimental traffic is not wanted or allowed. IOW, only traffic that is
    understood to conform to <RFC list> shall pass, because "Active Attacks are
    also Pervasive", to put a different spin on 7258. [acm] See also the comment in
    3.4.1.

[MK] This is not about experimentation. The expectation is that QUIC versions will change often, e.g. we already have a draft for a new version adopted in the group and there might be another RFC some time this year. So if you "manually" have to allow for new versions in all your equipment that will delay deployment of new versions (or even hinder them because there is always one box that doesn't get updated). Therefore we strongly recommend to not use the version to filter QUIC traffic. Is that not clear enough in the text?

       In addition, due to the speed of evolution of the
       protocol, devices that attempt to distinguish QUIC traffic from non-
       QUIC traffic for purposes of network admission control should admit
       all QUIC traffic regardless of version.

    [acm] I was hoping to see a description of fallback to TCP (I see that fallback
    is mentioned briefly at the end of section 4.2., and later, fail over and
    failover. pick one...)

    How can Network Operators observe when a QUIC setup has failed, and the
    corresponding TCP fallback connection(s) succeeded?

[MK] There is no unified way how and if fallback is implemented. However, why do you think a network operator would need that information?

    Is there a reference available with this info, to save effort here?

[MK] As I said this is rather implementation specific, so I would say no.

    ...

    3.4.1.  Extracting Server Name Indication (SNI) Information

    ...

       Note that proprietary QUIC versions, that have been deployed before
       standardization, might not set the first bit in a QUIC long header
       packet to 1.  However, it is expected that these versions will
       gradually disappear over time.
    [acm]
    And some networks may prefer not to admit experimental traffic. The goal of the
    experiment may be problematic for the network operator and/or their
    subscribers. I think this is legitimate operator behavior, and worth a few more
    words in the draft.

[MK] To be honest I don't understand this point. How would an operator even know if an experiment would be problematic or no? QUIC is fully encrypted. Versioning is only one extension mechanism. So basically even if you see the same version number, the QUIC behind that could behave very differently depending on which extensions are used and because of the encryption, there is no chance for the operator to know about this. Is this not clear in the document? Do we need to state this more clearly?

    ...

    3.8.1.  Measuring Initial RTT

    ...

       Handshake RTT can be measured by adding the client-to-observer and
       observer-to-server RTT components together.  This measurement
       necessarily includes any transport- and application-layer delay at
       both endpoints.
    [acm] suggest s/any/all/

[MK] Done!

    3.8.2.  Using the Spin Bit for Passive RTT Measurement

    ...

       Note that this measurement, as with passive RTT measurement for TCP,
       includes any transport protocol delay (e.g., delayed sending of
    [acm] suggest s/any/all/

[MK] Done!

    ...

       Since the spin bit logic at each endpoint considers only samples from
       packets that advance the largest packet number, signal generation
       itself is resistant to reordering.  However, reordering can cause
       problems at an observer by causing spurious edge detection and
       therefore inaccurate (i.e., lower) RTT estimates, if reordering
       occurs across a spin-bit flip in the stream.
    [acm] thanks for mentioning this!

    ...

       Raw RTT samples generated using these techniques can be processed in
       various ways to generate useful network performance metrics.  A
       simple linear smoothing or moving minimum filter can be applied to
       the stream of RTT samples to get a more stable estimate of
       application-experienced RTT.  RTT samples measured from the spin bit
       can also be used to generate RTT distribution information, including
       minimum RTT (which approximates network RTT over longer time windows)
       and RTT variance (which approximates jitter as seen by the
       application).
    [acm]   (let's avoid the clocky term "jitter", and clarify)
    Suggest: (which over-estimates one-way packet delay variance as seen by an
    application end-point).

[MK] Done!

    4.  Specific Network Management Tasks
    ...

    4.2.  Stateful Treatment of QUIC Traffic

       Stateful treatment of QUIC traffic (e.g., at a firewall or NAT
       middlebox) is possible through QUIC traffic and version
       identification (Section 3.1) and observation of the handshake for
       connection confirmation (Section 3.2).  The lack of any visible end-
       of-flow signal (Section 3.6) means that this state must be purged
       either through timers or through least-recently-used eviction,
       depending on application requirements.
    [acm] Comment: It suddenly struck me that this might be similar to the scenario
    that dkg frequently cited during QUIC development: His ISP would terminate idle
    TCP connections after many hours. See the citation of RFC5382 below. Don't
    expect QUIC connections to stay-up forever! The next Purge will occur in 3, 2,
    1, ...

[MK] QUIC has the connection ID mainly because time-outs for UDP are often short. So I think this is a known problem. Or is there anything you think we should add to this document?

       While QUIC has no clear network-visible end-of-flow signal and
       therefore does require timer-based state removal, the QUIC handshake
       indicates confirmation by both ends of a valid bidirectional
       transmission.  As soon as the handshake completed, timers should be
       set long enough to also allow for short idle time during a valid
       transmission.

       [RFC4787] requires a network state timeout that is not less than 2
       minutes for most UDP traffic.  However, in practice, a QUIC endpoint
       can experience lower timeouts, in the range of 30 to 60 seconds
       [QUIC-TIMEOUT].

       In contrast, [RFC5382] recommends a state timeout of more than 2
       hours for TCP, given that TCP is a connection-oriented protocol with
       well- defined closure semantics.  Even though QUIC has explicitly
       been designed to tolerate NAT rebindings, decreasing the NAT timeout
       is not recommended, as it may negatively impact application
       performance or incentivize endpoints to send very frequent keep-alive
       packets.

       The recommendation is therefore that, even when lower state timeouts
       are used for other UDP traffic, a state timeout of at least two
       minutes ought to be used for QUIC traffic.
    [acm]
    2 minutes, not hours. got it.

    ...

    4.5.  Filtering Behavior

       [RFC4787] describes possible packet filtering behaviors that relate
       to NATs but is often also used is other scenarios where packet
       filtering is desired.  Though the guidance there holds, a
       particularly unwise behavior admits a handful of UDP packets and then
       makes a decision to whether or not filter later packets in the same
       connection.  QUIC applications are encouraged to fail over to TCP if
    [acm]
    is "fail over" or "fallback" the preferred term?
    (using only one will help)

       early packets do not arrive at their destination
       [QUIC-APPLICABILITY], as QUIC is based on UDP and there are known
       blocks of UDP traffic (see Section 4.6).  Admitting a few packets
       allows the QUIC endpoint to determine that the path accepts QUIC.
       Sudden drops afterwards will result in slow and costly timeouts
       before abandoning the connection.

    4.6.  UDP Blocking, Throttling, and NAT Binding

    ...
       Further, if UDP traffic is desired to be throttled, it is recommended
       to block individual QUIC flows entirely rather than dropping packets
       indiscriminately.  When the handshake is blocked, QUIC-capable
       applications may fail over to TCP.  However, blocking a random
    [acm]
    is "fail over" or "fallback" the preferred term?
    (using only one will help)

       fraction of QUIC packets across 4-tuples will allow many QUIC
       handshakes to complete, preventing a TCP failover, but these
    [acm] ... or "failover" preferred?

       connections will suffer from severe packet loss (see also
       Section 4.5).  Therefore, UDP throttling should be realized by per-
       flow policing, as opposed to per-packet policing.  Note that this
       per-flow policing should be stateless to avoid problems with stateful
       treatment of QUIC flows (see Section 4.2), for example blocking a
       portion of the space of values of a hash function over the addresses
       and ports in the UDP datagram.  While QUIC endpoints are often able
       to survive address changes, e.g. by NAT rebindings, blocking a
       portion of the traffic based on 5-tuple hashing increases the risk of
       black-holing an active connection when the address changes.
    ...

    4.8.  Quality of Service Handling and ECMP Routing

       It is expected that any QoS handling in the network, e.g. based on
       use of DiffServ Code Points (DSCPs) [RFC2475] as well as Equal-Cost
       Multi-Path (ECMP) routing, is applied on a per flow-basis (and not
       per-packet) and as such that all packets belonging to the same active
       QUIC connection get uniform treatment.
    [acm] Comment: so networks should continue their *extra* efforts for datagrams,
    like maintaining order, while the datagram streams take away as much info as
    they can. got it...

[MK] I don't think networks should put in extra effort to reordering, especially as reordering usually causes delays. Actually QUIC, as well as TCP with certain extensions, can be quite robust to re-ordering but that's implementation specific, and therefore you never know as passive observer.  The ask is rather, as you do today for TCP, to avoid reordering in the first place if possible, e.g. use the full 5-tuple for ECMP. So I think the ask is actually rather to keep thing running as they are right now, than doing anything special for QUIC. Again do we need to make that message more clear?

[MK] Thanks again! Mirja


    Done.




-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call




[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux