Re: [Last-Call] [Int-dir] Intdir telechat review of draft-ietf-masque-connect-ip-10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Joe, hi all,

 

I would just quickly reply to the following part:

 

David: I've seen literature about nested TCP, which is both nested congestion control and nested loss recovery. In my understanding, the majority of the issues come from the two layers retransmitting the same data, not from the nested congestion controllers. 

 

Joe: The lower one slams the window down due to loss; the upper one should never really see loss at all (given it’s running over TCP), but every time a loss and retransmit occurs, the RTT measurements at the upper layer take a hit. So the bottom layer does what it can, but the upper layer gets into regimes where it thinks it can send more (RTT BW*delay) than it really can, which then causes process stalls at the upper layer.

 

Joe, this is not correct if QUIC datagrams are used as datagrams are no retransmitted and thus losses will be exposed to the tunneled connection without delay avoiding time-outs in the upper layer congestion control. This is what David meant by nested loss recovery. This may also have implications on congestion control but it’s probably less problematic.

 

Mirja

 

 

 

 

From: "touch@xxxxxxxxxxxxxx" <touch@xxxxxxxxxxxxxx>
Date: Wednesday, 19. April 2023 at 19:26
To: David Schinazi <dschinazi.ietf@xxxxxxxxx>
Cc: Magnus Westerlund <magnus.westerlund@xxxxxxxxxxxx>, "int-dir@xxxxxxxx" <int-dir@xxxxxxxx>, "draft-ietf-masque-connect-ip.all@xxxxxxxx" <draft-ietf-masque-connect-ip.all@xxxxxxxx>, "last-call@xxxxxxxx" <last-call@xxxxxxxx>, "masque@xxxxxxxx" <masque@xxxxxxxx>
Subject: Re: [Last-Call] [Int-dir] Intdir telechat review of draft-ietf-masque-connect-ip-10
Resent from: <alias-bounces@xxxxxxxx>
Resent to: <magnus.westerlund@xxxxxxxxxxxx>, <Zaheduzzaman.Sarker@xxxxxxxxxxxx>, <mirja.kuehlewind@xxxxxxxxxxxx>, <achernya@xxxxxxxxxx>, <ekinnear@xxxxxxxxx>, <tpauly@xxxxxxxxx>, <caw@xxxxxxxxxxxxxxx>, <dschinazi.ietf@xxxxxxxxx>, <martin.h.duke@xxxxxxxxx>
Resent date: Wednesday, 19. April 2023 at 19:25

 

Hi, David,

 

More below…

 

Joe

Dr. Joe Touch, temporal epistemologist



On Apr 19, 2023, at 9:46 AM, David Schinazi <dschinazi.ietf@xxxxxxxxx> wrote:

 

Thanks, more discussion inline.

David

 

On Tue, Apr 18, 2023 at 9:36 PM touch@xxxxxxxxxxxxxx <touch@xxxxxxxxxxxxxx> wrote:

On Apr 18, 2023, at 7:00 PM, touch@xxxxxxxxxxxxxx wrote:

Hi, David and Magnus,

Replies below, cutting to the remaining issues…
(Trying again - I got reports of mail failures)

Joe

Dr. Joe Touch, temporal epistemologist
www.strayalpha.com

> On Apr 18, 2023, at 5:26 PM, David Schinazi <dschinazi.ietf@xxxxxxxxx> wrote:
>
> Thanks Joe and Magnus for the replies.
> Some more responses inline.
> David
>
> On Tue, Apr 18, 2023 at 1:23 AM Magnus Westerlund <magnus.westerlund@xxxxxxxxxxxx> wrote:
>> Hi,
>> 
>> Please see inline. Prefix with “MW:”

...

>> It is missing the way in which these ingress/egress
>> components are viewed a their endpoints, e.g., to be useful as an IP tunnel,
>> these need to appear as attached to (possibly virtual) network interfaces,
>> i.e., to appear as a link, which allows them to then be used for local
>> processes (via sockets), packet forwarding, etc.
>> That's an implementation detail that doesn't belong in this document. Most
>> implementations will indeed use virtual TUN interfaces, but it's not a requirement.
>> There is a known implementation with transport protocols in userspace that doesn't
>> do what you describe.
>> Whether a TUN interface is used or some other method, there needs to be a method by which these applications (client, server) present something that accepts IP packets.
>> 
>> That aspect of how this is actually used is ignored and needs to be addressed. It does not need to be implementation specific, but it would not hurt to give an example like TUNs.
>> 
>> (The fact that other user-space IP systems ignore this issue is not rationale for this document also ignoring it)
>>  MW: I don’t see how any text can be other than informational. Considering the below discussion. Are you asking for a general discussion of the boundaries between the routing and the link the tunnel that this construct results in, especially as it puts some traffic filtering rules in front of the encapsulation that affects the routing?
I’m asking for both the routing and endpoint behaviors to be described in relation to the tunnel.
>> https://github.com/ietf-wg-masque/draft-ietf-masque-connect-ip/issues/165
> I think I now understand what Joe was saying. We didn't make it clear enough that this document specifies a (virtual) link with routers attached to it. The IP proxying endpoints both act as routers that are connected by this virtual link. That's why we talk about having them send ICMP. Adding some text to clarify this should help. We'll cover this in issue 165.

If that’s what you’re defining, it is incorrect. It can’t be a router. If it were and traffic were to go from tunnel to real interface, it would have its IP decremented twice, which is inconsistent with RFC1812.

This is a tunnel. It should not try to be a router or a host. It can’t issue ICMPs properly esp. because it can’t take into consideration the relation of this tunnel to other interfaces, the endpoint and its rules for ICMP (per RFC1122) or a router and its rules for ICMP (per RFC1812).

What you are describing is not a convenience. It’s *incorrect*. I’ve noted that in the text deleted between here and the next issue (see past emails for that detail).

 

Perhaps another way to present this is that connect-ip is a virtual link with a half router on each side, but that gets harder to reason about.

 

Yes, and it should ;-)



In practice, we do need some router functionality here to ensure that a packet that loops between virtual connect-ip links will have its TTL decremented to prevent infinite loops.

 

But you do not. You should not be doing any forwarding at all - tunnels are links, not routers or half-routers. Links don’t decrement the TTL. If they did, they’d give the wrong answer to traceroute, etc.



That can be implemented by sending packets through the kernel, but some implementations might want to handle that all in the connect-ip process to improve performance - and we need to make sure those implementers don't forget to decrement the TTL.

 

That’s up to the user but outside the scope of a tunnel. This tunnel should be something that both can be to a TUN device, a user router, etc.

 

All devices and processes that relay packets *between* interfaces need to make sure they decrement and check the TTL - that’s already in RFC1812. It is not the job of the tunnel to make sure that happens.

 

 

>> As other reviewers have noted, Sec 10 on nested congestion control is quitethin. The current statement is equivalent to “if you KNOW congestion is nested,
>> turn it off” – it should be the opposite, i.e., “turn congestion ON only if you
>> KNOW congestion is NOT nested”.
>> Fair enough. We're tweaking that section to be more permissive:
>> https://github.com/ietf-wg-masque/draft-ietf-masque-connect-ip/pull/162It’s not about allowing congestion control to be disabled; that needs to be a SHOULD, with the caveat that when it is not, performance can suffer in ways that are difficult to predict.
>> MW: I agree, and we could actually say SHOULD in that text under those constraints rather than MAY. However, doing this disabling first of all requires support of the DATAGRAM extension and will require changes to the QUIC stack that might not be possible in all deployment scenarios. And it can’t be done in general as some usage of this tunnel specification might happen over other HTTP versions than HTTP/3 that uses TCP.
>>  https://github.com/ietf-wg-masque/draft-ietf-masque-connect-ip/issues/164
“It can’t be done” in some cases - that can be used as a rationale for implementers not following the SHOULD (i.e., the reason for an exception).

> I personally don't think a SHOULD is reasonable here. We don't have enough data to demonstrate that this advice is always sound. My personal experience is that nested congestion control loops work fine in practice. (Note that this is nested congestion control, not nested loss recovery.) Whether this should be done or not is very dependent on the deployment environment. Adding some text warning of the risks is always good, but a normative recommendation is a step too far.

This isn’t an issue for anecdotal discussion; it’s been proven in the literature. Nested control loops are never stable unless they’re specifically designed to do so, and two instances of the same control with the same parameters (TCP on TCP, esp. if the same variant) is particularly bad.

I’m suggesting normative SHOULD with specific exceptions that can include places where it can’t be done.

 

I've seen literature about nested TCP, which is both nested congestion control and nested loss recovery. In my understanding, the majority of the issues come from the two layers retransmitting the same data, not from the nested congestion controllers.

 

The lower one slams the window down due to loss; the upper one should never really see loss at all (given it’s running over TCP), but every time a loss and retransmit occurs, the RTT measurements at the upper layer take a hit. So the bottom layer does what it can, but the upper layer gets into regimes where it thinks it can send more (RTT BW*delay) than it really can, which then causes process stalls at the upper layer.

 

I haven't seen literature about nested congestion control without nested loss recovery.

 

See above; it’s not nested loss recovery because that won’t happen.The issue is that the parameters that determine the combined flow/congestion control windows interact very badly.



If you squint hard enough, any IP router connected to heterogeneous links is a congestion controller because if the input link is getting more packets in than the output link can handle, the router will drop some of them. In that world view, almost every TCP connection on the Internet involves nested congestion controllers, and it's working quite well.

 

An IP router that changes paths a lot will impact TCP, yes. But the drops are also why we use RED-like (non-tail) drops and ECN; that doesn’t happen with TCP over TCP. They’re not equivalent.



 

>> Section 11.1 refers to fragmented packets; it should refer to them as not being
>> able to be “re-fragmented”; source-generated fragments are still fragmented and
>> can cross the tunnel subject to the tunnel MTU.
>> The use of "fragmented" in that section refers to QUIC datagram frames, which
>> cannot be fragmented - this isn't about IP fragmentation.
>> The section talks about whether IP packets can fit inside QUIC datagram frames.
>> 
>> Fragmentation of those packets can - and will - happen when those packets are generated on the host where the packets enter the tunnel, unless the host decides to force “don’t fragment” on those packets. That’s a decision that happens (could happen or should happen, depending on your viewpoint) before the packets ever get to the tunnel ingress.
>> 
>> On-path fragmentation of IPv4 packets relayed to the IP proxy happens (could happen or should happen, again depending on your viewpoint) before those packets ever get to the tunnel ingress.
>> 
>> Either of those can happen - even if QUIC datagrams sit inside IP packets with DF=1 (or IPv6) that are also not source fragmented.
>> 
>> MW: So I think this may need a bit of wording clean up and also clarification of the conceptual model. If I understand Joe correct here a reasonable way of looking on this is that when a packet arrive at the router part with this tunnel as one of its interfaces, the router part will have some knowledge of the current tunnel MTU. The tunnel itself will not refragement the data, but it will support a particular MTU. Thus, the router part can actually fragment an IPv4 packet that doesn’t have the DF bit set and send it into the tunnel. And in relation to discussion about ICMP generation. It will be the router part that generates the ICMP when the routing decision says send it over the tunnel, but the tunnel MTU is too small for the packet to fit.
And, FWIW, that’s another reason why the routing part belongs outside of the tunnel. It’s not just routing, it’s also endpoint (source) behavior. The tunnel has no business trying to replicate this behavior when it’s already part of the endpoint system (if it weren’t, then HTTP over X over IP wouldn’t be available as a tunnel mechanism).
>> https://github.com/ietf-wg-masque/draft-ietf-masque-connect-ip/issues/165
> I agree with Magnus, if we clarify the split between link and router then this becomes natural.

I agree with the split, but the router cannot be part of this mechanism.

 

In practice, implementers of connect-IP will often implement part of the router function, so it's useful to mention it so folks don't forget important parts. Phrasing it as not a part of the mechanism and instead as a part of the overall environment is reasonable.

 

I appreciate that this discussion talks a lot about not including implementation details - I’ll use that justification here.

 

Whether users commonly implement user-level routers or not, that doesn’t belong in the spec for a tunnel.

 

---

 

-- 
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call

[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Mhonarc]     [Fedora Users]

  Powered by Linux