Re: [Last-Call] Opsdir last call review of draft-ietf-quic-manageability-14

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Al,

On 16 Mar 2022, at 20:23, MORTON JR., AL <acmorton@xxxxxxx> wrote:

Hi Mirja,

-----Original Message-----
From: Mirja Kuehlewind <mirja.kuehlewind@xxxxxxxxxxxx>
Sent: Wednesday, March 16, 2022 10:40 AM
To: MORTON JR., AL <acmorton@xxxxxxx>
Cc: last-call@xxxxxxxx; draft-ietf-quic-manageability.all@xxxxxxxx;
quic@xxxxxxxx; ops-dir@xxxxxxxx
Subject: Re: Opsdir last call review of draft-ietf-quic-manageability-14

Hi Al,

as you might have seen we merged the remaining PRs and submitted a new version
last week.
But unfortunately, I don't think we were able to address your comment below
fully.

Regarding use of version number I believe the text in the draft reflects the
group consensus, so we only made my editorial change to make if clearer.
[acm] 
Your draft and WG consensus discourages use of the version field for admission purposes.
This is a question of whether any WG should state a consensus that expressed a *policy* for network managers and operators in an IETF RFC. If the same intent was stated as a conclusion reached by the WG, it would be far more palatable, and I offered alternative text as an example.

I don’t think we’re stating a policy here, we’re stating a recommendation.

Note that QUIC v2 is mostly done; it’s a minimal change to the wire image meant to exercise the versioning mechanism. A network that admits only QUIC v1 (which, indeed, seems mostly reasonable from the standpoint of an operator used to the last few decades of the use and abuse of extensions in the Internet) will, at that point, reject ~half of QUIC traffic for ~no benefit to the operator or its users. The recommendation is meant to avoid that sort of silliness.

The reason behind this version agility is, in turn, to maintain the deployability of new versions. Networks are of course free to admit any traffic they want; the point of this language is to point out the mostly-negative tradeoff of doing so.

So, let's consider this issue as needing further discussion in a wider venue.

I think we’re having that discussion on last-call@xxxxxxxx right now. :)

Regarding when the handshake fails, I'm not sure if it would be correct to say
anything more here. You can always just not see some of the packets on the
path, or the handshake could even change with a new version or an extension I
guess. Again I'm also not really sure what to do with that information either.
If you don't see any further packets flowing at any time, incl. right after
the handshake, something went either wrong or the transmission is just done.
It's really hard to make any assumption from the network here.
[acm] 
The case I cited was an operator that wants to support QUIC, and wants to identify when QUIC setup fails and how frequently failure occurs, to support analysis and troubleshooting and properly manage their network.

There seems to be a tacit assumption here that holds in the TCP case that does not necessarily hold in the QUIC case: that an operator can helpfully debug the operation and performance of a transport protocol within their network. One of the reasons this is a useful (indeed, essential) role of network operators in the TCP world is that there is often an unavoidable, unintentional, transport-dependent differential impact of an operator’s own network on different traffic flows, where the remedy is often only actionable by the operator itself.

I’d submit that the main reason this happens with protocols like TCP is that the TCP wire image is path-observable and path-mutable. Without this path-observability and path-mutability, the set of possible flow-dependent impacts is necessarily reduced, if not eliminated. Without operator-actionable problems on the network, the observability of internal protocol dynamics from non-cooperative third parties becomes less important.

In other words, the set of wire image features that can cause differential treatment in an operator's network is equal to the set of wire image features that are freely observable by that operator.

Cheers,

Brian

I also note the dependency on knowing the version number in your paragraph above (when attempting to understand the handshake), as hint to accomplishing this management goal (by relating the version to a published specification). 

I think that a supporting operator (like the one above) is the most-likely reader of your memo, so it will help them to add a few sentences about non-Figure 1 handshakes. Even if the sentences are something like this (based on what you said above):

If the handshake in Figure 1 is truncated or missing packets, many actual outcomes are possible (and not necessarily handshake failure). The end-points may have switched to a different version and handshake, switched to a different path, implemented fallback, terminated the attempt as the end-points intended, or other outcome. 
-=-=-=-=-=-=-=-=-=-

Over time, observers will likely develop heuristics to mitigate these uncertainties and draw probable conclusions (like they did with TCP), but you don't need to add that aspect. Just indicate the possibilities and try to improve the manageability of QUIC.

Al


Mirja



On 05.03.22, 17:02, "MORTON JR., AL" <acmorton@xxxxxxx> wrote:

Hi Mirja, thanks for your replies and PRs.
please see replies below, I clipped discussions we have closed.
Al

-----Original Message-----
From: Mirja Kuehlewind <mirja.kuehlewind@xxxxxxxxxxxx>
Sent: Tuesday, March 1, 2022 1:49 PM
To: MORTON JR., AL <acmorton@xxxxxxx>
Cc: last-call@xxxxxxxx; draft-ietf-quic-manageability.all@xxxxxxxx;
quic@xxxxxxxx; ops-dir@xxxxxxxx
Subject: Re: Opsdir last call review of draft-ietf-quic-manageability-14

Hi Al,

thanks again! See below!

On 27.02.22, 19:50, "MORTON JR., AL" <acmorton@xxxxxxx> wrote:

[snip]
...
[acm] I see that there was additional editing since you wrote last
Monday,
so I made a comment and suggestions on GitHub.

[MK] Thx! Added you suggestions!

[snip]



2.8. Version Negotiation and Greasing

...
QUIC is expected to evolve rapidly, so new versions,
both
experimental and IETF standard versions, will be
deployed
on the
Internet more often than with traditional Internet-
and
transport-
layer protocols. Using a particular version number
to
recognize
valid QUIC traffic is likely to persistently miss a
fraction of
QUIC
flows and completely fail in the near future, and is
therefore
not
recommended.
[acm] Where "valid traffic" is the focus, I agree, let
it
flow.
But the Operator's focus may instead be "admissible
traffic",
where
experimental traffic is not wanted or allowed. IOW, only
traffic
that is
understood to conform to <RFC list> shall pass, because
"Active
Attacks are
also Pervasive", to put a different spin on 7258. [acm]
See
also the
comment in
3.4.1.

[MK] This is not about experimentation.
[acm]
OK, let's just say unexpected traffic.

The expectation is that QUIC versions
will change often, e.g. we already have a draft for a new
version
adopted in
the group and there might be another RFC some time this
year. So
if you
"manually" have to allow for new versions in all your
equipment
that
will
delay deployment of new versions (or even hinder them
because
there is
always
one box that doesn't get updated). Therefore we strongly
recommend
to
not use
the version to filter QUIC traffic. Is that not clear enough
in
the
text?

In addition, due to the speed of evolution of the
protocol, devices that attempt to distinguish QUIC
traffic
from
non-
QUIC traffic for purposes of network admission
control
should
admit
all QUIC traffic regardless of version.
[acm]
I think it is clear, and at the same time, it is aspirational
for
many
networks.
This sentence informs, but then strays into policy.

Maybe this will work:
...devices that attempt to distinguish QUIC traffic from
non-
QUIC traffic for purposes of network admission control
should
not
rely
on the version field alone.

[MK] I think your proposal is not correct because the whole point
is
that you
really should not use the version field _at all_. I know that
people
will
still do that, but I think we should at least spell it out clearly
here
that
this is problematic and hinders evolution.
[acm]
Evolution is what happens when a succeeding RFC is approved.
Experimentation is the many months between approvals.

...devices that attempt to distinguish QUIC traffic from non-
QUIC traffic for purposes of network admission control
*** should admit all QUIC traffic regardless of version.***
The last phrase attempts to define operator policy.
Don't do that.
The version field exists. It's specified in a standard.
If you simply say,
"The version field will change in the future." no one will be
surprised.

[MK] Okay I got your point about policy. However, this document is meant
to
provide guidance/recommendations to operators. I also see now that this
in the
"background" part which is also rather to explain QUIC than give
recommendations. However, I think this is actually one of the essential
recommendations of the document, so I would like to still spell this out
clearly and as early/often as possible. I tried a slightly different
wording
in a new PR on github. Is that any better?


https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=31323334-
501d5122-313273af-454445555731-0c8d12cf3c8f69d3&q=1&e=0560674f-fb74-4ca7-afd2-
16c2148a7129&u=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fgithub.com*
2Fquicwg*2Fops-
__;JSUlJSUlJSUlJQ!!BhdT!iHzaYKyN6pGji70tbHntNd77OXfIU4uuz7yrrrdyIBk1xF8H4AbY4b
Yu77k6OuH_qZ54CWUksfGvL7zx23VlQpdp$
drafts/pull/459/files__;!!BhdT!mOHh0CyPDRUf9uvgZfIrDspADvFLupiMn-
5czo4ercUtLNr7_gQJcuGTzI0cYadmIRktrtZrgoTKCp4DmqHssizC$

[acm]
Not yet. Maybe we can compose your message to operators *without* making
it sound like you are trying to set policy. I suggested text in the PR like
this:

Developers would prefer admission of all QUIC traffic regardless of
version in order to support continuous version-based evolution. However, all
parties understand the value of versions with a corresponding, fully-approved
standard.




[acm] I was hoping to see a description of fallback to
TCP (I
see
that fallback
is mentioned briefly at the end of section 4.2., and
later,
fail
over and
failover. pick one...)

How can Network Operators observe when a QUIC setup has
failed, and
the
corresponding TCP fallback connection(s) succeeded?

[MK] There is no unified way how and if fallback is
implemented.
However, why
do you think a network operator would need that information?
[acm]
To affirm that their admission policy is working properly, for
one
reason.

[MK] However, there is really no guarantee that all QUIC will have
a
fallback.
Without further knowledge about what higher layer service the QUIC
transport
carries, I don't think you can make any assumption about fallback.
If
you want
to support evolution, you need to support QUIC and not rely on any
potentially
fallbacks.
[acm]
I chose example carefully: the operator wants to support QUIC, but
has
reports that QUIC setup is failing and needs to make measurements to
gather
symptoms & info. Experience will indicate the circumstances where QUIC
setup
failure is accompanied by fallback, and other possibilities. Repeated
experiences become heuristics for passive observation.
No assumptions necessary.
Has QUIC setup failed if the exchanges in Figure 1 are incomplete?
I think there might be a yes or no answer...
If no, then the passive observation procedure will mostly be
governed by
heuristics.

[MK] I think I lost the point now. If QUIC fails even if there is a
fallback,
that's still not great because the original intention was obviously to
use
QUIC. Is there anything we need to say in the draft that is missing?
[acm]
Without getting into fallback in any way,
Help the operator determine when a QUIC setup has failed by providing a
little more info.
It would be useful to know:
What QUIC messages would accompany a QUIC setup failure? (other than those
in Figure 1)
OR
A statement like:
If the exchange in Figure 1 is incomplete, then the QUIC setup has failed.
(IF that is true)





Is there a reference available with this info, to save
effort
here?

[MK] As I said this is rather implementation specific, so I
would
say
no.

...

3.4.1. Extracting Server Name Indication (SNI)
Information

...

Note that proprietary QUIC versions, that have been
deployed
before
standardization, might not set the first bit in a
QUIC long
header
packet to 1. However, it is expected that these
versions
will
gradually disappear over time.
[acm]
And some networks may prefer not to admit experimental
traffic. The
goal of the
experiment may be problematic for the network operator
and/or
their
subscribers. I think this is legitimate operator
behavior, and
worth
a few more
words in the draft.

[MK] To be honest I don't understand this point. How would
an
operator
even
know if an experiment would be problematic or no? QUIC is
fully
encrypted.
Versioning is only one extension mechanism. So basically
even if
you see
the
same version number, the QUIC behind that could behave very
differently
depending on which extensions are used and because of the
encryption,
there is
no chance for the operator to know about this. Is this not
clear
in the
document? Do we need to state this more clearly?
[acm]
First, let's say s/experimental/unexpected/ or
s/experimental/proprietary/
Then, I'm responding to your reply more than the paragraph in
the
draft
now:
Network operators are also end users, and often act on their
subscriber's
behalf. Observations are not strictly limited to mid-points, where
encryption
is present.
Harboring old notions of what operators cannot do will not sit
well
with
your audience...

So, (in the paragraph above) you've informed operators that
some
proprietary QUIC versions remain in use as of this writing.
But traffic that doesn't conform *might* be considered
nefarious.
That's
all. It's a message for everyone involved.

[MK] I think the point is actually rather that we want to say
here: if
you
don't support these old versions that will not be a problem in the
near
future.
[acm]
Ok, say that in the draft, please.

[MK] Okay started a PR on github. Is that more clear now?


https://urldefense.com/v3/__https://protect2.fireeye.com/v1/url?k=31323334-
501d5122-313273af-454445555731-0c8d12cf3c8f69d3&q=1&e=0560674f-fb74-4ca7-afd2-
16c2148a7129&u=https*3A*2F*2Furldefense.com*2Fv3*2F__https*3A*2F*2Fgithub.com*
2Fquicwg*2Fops-
__;JSUlJSUlJSUlJQ!!BhdT!iHzaYKyN6pGji70tbHntNd77OXfIU4uuz7yrrrdyIBk1xF8H4AbY4b
Yu77k6OuH_qZ54CWUksfGvL7zx23VlQpdp$
drafts/pull/460/files__;!!BhdT!mOHh0CyPDRUf9uvgZfIrDspADvFLupiMn-
5czo4ercUtLNr7_gQJcuGTzI0cYadmIRktrtZrgoTKCp4DmqkDVmpL$


[acm]
I'm ok with this one, thanks.

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux