Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?

Toerless Eckert <tte@xxxxxxxxx> · Sat, 7 Aug 2021 17:17:16 +0200

Thanks, Mark, Jeff,

but let me rephrase my core question as its not answered, maybe
i wasn't asking it crisply enough or i didn't recognize the answer:

Do our RFCs say or imply anything about whether or not hosts can change
the IPv6 flow label field of a single flow during the lifetime
of a flow (MUST, SHOULD, MAY, ... MAY NOT, SHOULD NOT, MUST NOT)

Cheers
    Toerless

On Sat, Aug 07, 2021 at 01:29:44PM +1000, Mark Smith wrote:
> On Sat, 7 Aug 2021 at 11:49, Toerless Eckert <tte@xxxxxxxxx> wrote:
> >
> > [bitching]
> > I apologize for attempting to respond to the original post topic instead
> > of derailing the thread into my pet side topic without changing subject,
> > which seems to be expected behavior on ietf@xxxxxxxx.
> > [/bitching]
> >
> > Adding ipv6@xxxxxxxx as that seems to be the closest WG list for the topic.
> >
> > Brian reminded us that we have ample RFCs to elaborate on the fact that you can not
> > reasonably expect for connections to an anycast address to work when persistently
> > using the anycast address.
> >
> 
> Yes, even the first anycast RFC (RFC1546) recognised that.
> 
> > Christian pointed out how QUIC does the right thing. Great! Maybe we should
> > have a an anycast support hall of fame and shame for protocols: DOes or does it
> > not support single round-trip resolution of anycast to unicast address.
> >
> 
> MPTCP and SCTP could support that too:
> 
> https://tools.ietf.org/id/draft-smith-6man-form-func-anycast-addresses-01.html#rfc.section.5.7.7
> 
> 
> > But back to what seems to be the root cause, which isn't anycast, but IPv6
> > flow label "abuse" ?!
> >
> > I specifically had not heard of this Linux "hack" to change flow-label
> > mid-connection after TCP RTO to overcome a seemingly broken path and hope for
> > the new flow-label to pick another, working path (most likely in a data cener).
> >
> 
> The flow label is supposed to be a hint from hosts to the network as
> to what a flow is, so that the network can try to provide a better
> service to the hosts beyond the minimum that is provided by
> destination addresses based forwarding.
> 
> If these non-flow representing flow labels start making it harder for
> network operators to manage traffic flows, the answer is easy and will
> be inevitable. Network operators will ignore the flow label when doing
> ECMP or LAG.
> 
> My perspective is both as a network operator, and as somebody who
> helped with RFC6437 in the interests of getting those flow hints to be
> able to provide a better network service.
> 
> <snip>
> 
> > Ultimately, i have not a lot of sympathy for the linux behavior, even if it
> > was blessed by RFC6437, because i think good networks should fix broken paths
> > fast enough for this hack to be not necessary...
> >
> 
> Yes, and that is what a network and network operators' job is.
> 
> If hosts were to be properly involved in that then they'd need to
> participate in IGPs and EGPs, and somehow attain knowledge of the
> traffic management policies that network operators apply to IGPs and
> EGPs.
> 
> Regards,
> Mark.
> 
> 
> 
> > Cheers
> >     Toerless
> >
> > On Tue, Aug 03, 2021 at 10:45:29PM +1200, Brian Carpenter wrote:
> > > The issue of anycast and unstable routes is hardly a new discovery; this
> > > Linux feature is not creating a new problem. I suggest reading RFC7094 and
> > > RFC4786 before continuing this conversation.
> > >
> > > I certainly wouldn't design a protocol that relied on stable transport
> > > connections to an anycast address.
> > >
> > > Regards,
> > >     Brian Carpenter
> > >     (via tiny screen & keyboard)
> > >
> > > On Tue, 3 Aug 2021, 22:10 Michael Tuexen, <michael.tuexen@xxxxxxxxxxxxxxxxx>
> > > wrote:
> > >
> > > > > On 3. Aug 2021, at 11:44, Vasilenko Eduard <vasilenko.eduard@xxxxxxxxxx>
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > > I am writing to this alias because I do not know the proper one for such
> > > > type of a problem (OS/LINUX/BSD).
> > > > > The history of how Alexander Azimov (Yandex) has found the problem is
> > > > below.
> > > > >
> > > > > In short: if TCP loses connectivity for 200ms (or 1s in SYN stage) then
> > > > TCP changes IPv6 flow label (for the active TCP session!) to push traffic
> > > > to a different path.
> > > > > Current networks are extensively ECMP, if intermediate nodes support
> > > > flow label for hash calculation then a high probability that the path would
> > > > be changed.
> > > > > LINUX/BSD does not want to wait till the network will fix its problem.
> > > > As far as I know, Linux implements something you describe, but I'm not
> > > > aware on this behaviour being
> > > > implemented in *BSD, at least not in FreeBSD.
> > > > >
> > > > > If the final destination was anycast then the final destination would be
> > > > changed too by the same hash calculation.
> > > > > The stateful session would be broken as a result (see the second part of
> > > > Alexander’s presentation below).
> > > > >
> > > > > Since the time LINUX has made the default RTO flow label recalculation
> > > > (2016), IPv6 Anycast is broken.
> > > > > People would have one more reason not to migrate to IPv6. Flow label
> > > > does not exist in IPv4 – OS is not capable to break IPv4 Anycast similarly.
> > > > >
> > > > > Is anybody would like to spend his/her karma to save IPv6 Anycast OR let
> > > > it die?
> > > > > It is broken already for 5 years and nobody has spotted it up to now. Is
> > > > it needed?
> > > > > (I have seen a few drafts heavily dependent on IPv6 anycast)
> > > > >
> > > > > What is proper WG for such a problem?
> > > > At IETF 110 Alexander gave a presentation on this in TCPM and V6OPS. See
> > > > the Minutes and the corresponding slides at
> > > > https://datatracker.ietf.org/meeting/110/proceedings
> > > >
> > > > At least at the TCPM meeting, it was suggested that an ID would be written.
> > > >
> > > > However, the behaviour you are describing, is implementation specific to
> > > > Linux, this is not described or
> > > > recommended by an RFC.
> > > >
> > > > Best regards
> > > > Michael
> > > > >
> > > > > I am concerned that Anycast has been killed, it is not an easily
> > > > replaceable tool.
> > > > > Maybe somebody would propose something better but if not
> > > > > then LINUX should be returned to 2015 when flow label change on RTO was
> > > > a non-default configuration.
> > > > > Such LINUX behavior could be valuable in some restricted domains (see
> > > > below) when the administrator is sure that Anycast is not possible on the
> > > > traffic path.
> > > > >
> > > > > Eduard
> > > > > From: Vasilenko Eduard
> > > > > Sent: Tuesday, August 3, 2021 12:05 PM
> > > > > To: 'Jeff Tantsura' <jefftant.ietf@xxxxxxxxx>; Alexander Azimov <
> > > > a.e.azimov@xxxxxxxxx>
> > > > > Cc: Alexander Azimov <mitradir@xxxxxxxxxxxxxx>; routing WG <
> > > > rtgwg@xxxxxxxx>
> > > > > Subject: RE: Self-healing Networking with Flow Label
> > > > >
> > > > > Hi all,
> > > > > Not many people worldwide read this alias and understand
> > > > > That RTO could be leveraged to fight “silent drops” in the DC
> > > > environment.
> > > > > It is a good use case to publish/document (with more details that it was
> > > > in the presentation).
> > > > > I hope that in the future OAM would be used for this purpose – it is
> > > > better from architecture point of view.
> > > > > Eduard
> > > > > From: Jeff Tantsura [mailto:jefftant.ietf@xxxxxxxxx]
> > > > > Sent: Tuesday, August 3, 2021 1:08 AM
> > > > > To: Alexander Azimov <a.e.azimov@xxxxxxxxx>
> > > > > Cc: Vasilenko Eduard <vasilenko.eduard@xxxxxxxxxx>; Alexander Azimov <
> > > > mitradir@xxxxxxxxxxxxxx>; routing WG <rtgwg@xxxxxxxx>
> > > > > Subject: Re: Self-healing Networking with Flow Label
> > > > >
> > > > > Eduard,
> > > > >
> > > > > The idea of the draft to come is to explain what to do - when and how.
> > > > > The goal is not to regulate (we really don’t) but to provide, similarly
> > > > to RFC7938 a set of guidelines that community can use to build better and
> > > > more resilient networks.
> > > > >
> > > > > Cheers,
> > > > > Jeff
> > > > >
> > > > >
> > > > > On Aug 2, 2021, at 04:01, Alexander Azimov <a.e.azimov@xxxxxxxxx> wrote:
> > > > >
> > > > > 
> > > > > Eduard,
> > > > >
> > > > > пн, 2 авг. 2021 г. в 13:45, Vasilenko Eduard <
> > > > vasilenko.eduard@xxxxxxxxxx>:
> > > > > It is the key in this presentation “This behavior MUST be switched off
> > > > by default”
> > > > > It has been shown on slides 7-10 that flow label change on RTO is
> > > > enabled by default for BSD and LINUX.
> > > > > It needs regulation. It needs a standard RFC. Because it kills Anycast
> > > > otherwise.
> > > > > As I'm partially responsible for the key points of the presentation, I
> > > > can stress that it is a bit different.
> > > > >       • We have an opportunity for self-healing TCP on top of IPv6, it
> > > > should be preserved;
> > > > >       • The Linux defaults should be changed to a safe mode to prevent
> > > > session timeouts;
> > > > >       • The hash recalculation behavior should be documented;
> > > > > I'm not sure what you mean by the term 'regulation'.
> > > > >
> > > > > The story of how to use RTO to work-around “silent drop” vendor’s bugs
> > > > could be a good informational RFC.
> > > > > My be people developing iOAM would pay more attention to this use case.
> > > > >
> > > > > IMHO: these are 2 separate drafts.
> > > > > I'm not sure about it, we'll try to provide -00 before the next IETF
> > > > meeting, let's see how it progresses.
> > > > >
> > > > > Eduard
> > > > > From: Alexander Azimov [mailto:mitradir@xxxxxxxxxxxxxx]
> > > > > Sent: Monday, August 2, 2021 1:20 PM
> > > > > To: Vasilenko Eduard <vasilenko.eduard@xxxxxxxxxx>; Jeff Tantsura <
> > > > jefftant.ietf@xxxxxxxxx>
> > > > > Cc: routing WG <rtgwg@xxxxxxxx>
> > > > > Subject: Re: Self-healing Networking with Flow Label
> > > > >
> > > > > Eduard,
> > > > >
> > > > > Please see the quote from the slide 28. My suggestion was:
> > > > >
> > > > > Client – sends SYN, Server – responds with SYN&ACK
> > > > >       • In case of SYN_RTO or RTO events Server SHOULD recalculate its
> > > > TCP socket hash, thus change Flow Label. This behavior MAY be switched on
> > > > by default;
> > > > >       • In case of SYN_RTO or RTO events Client MAY recalculate its TCP
> > > > socket hash, thus change Flow Label. This behavior MUST be switched off by
> > > > default;
> > > > > This looks like a safe default behavior, that saves the part of the
> > > > improvements, but makes the work with stateful anycast services safe.
> > > > >
> > > > > And yes, IMO it's ok to have a knob to enable it in the controlled
> > > > environment. If you ask how to enable it in the presence of internal
> > > > anycast services - there was also a suggestion in the slides: eBPF. It
> > > > gives a good way to make this kind of separation.
> > > > >
> > > > > 02.08.2021, 11:48, "Vasilenko Eduard" <vasilenko.eduard@xxxxxxxxxx>:
> > > > > Hi Jeff,
> > > > > The situation when Control Plane does not understand what the Forwarding
> > > > pane doing is a bug.
> > > > > Yes, RTO in TCP helps to find a work-around for this bug. And yes,
> > > > Anycast is typically absent inside DC – it does not create the problem in
> > > > the DC environment.
> > > > >
> > > > > But the same LINUX is used outside DC. RTO Flow Label change here would
> > > > create even more problems if Anycast would happen on the traffic path (not
> > > > much predictable for client).
> > > > > Do we need separate LINUX distribution for DC and separate distribution
> > > > for other environments?
> > > > > Or should we rely on the proper non-default configuration for different
> > > > environments? (Admin should not forget to change)
> > > > > What if Anycast would become needed in DC?
> > > > >
> > > > > RTO flow label recalculation is mutually exclusive with Anycast on the
> > > > traffic part.
> > > > > What is more valuable for the public?
> > > > >
> > > > > IMHO: It is better to fight the problem of such type of a bug with iOAM
> > > > than to cancel Anycast.
> > > > >
> > > > > IMHO: It is better to have Flow Label recalculation on RTO as “off” by
> > > > default.
> > > > > DC Admin has the higher qualification to activate it in a controlled
> > > > environment than every client worldwide that should not forget to disable
> > > > it.
> > > > >
> > > > > Eduard
> > > > > From: Jeff Tantsura [mailto:jefftant.ietf@xxxxxxxxx]
> > > > > Sent: Monday, August 2, 2021 6:56 AM
> > > > > To: Vasilenko Eduard <vasilenko.eduard@xxxxxxxxxx>
> > > > > Cc: mitradir@xxxxxxxxxxxxxx; routing WG <rtgwg@xxxxxxxx>
> > > > > Subject: Re: Self-healing Networking with Flow Label
> > > > >
> > > > > Eduard,
> > > > >
> > > > > The issue is present not in link/device case, if well implemented - fast
> > > > rehash takes care of updating forwarding within a number of ms. The problem
> > > > is with  “gray” failures,  when the link in question is UP from
> > > > routing/forwarding prospective but drops traffic (mostly occasionally and
> > > > when a HW bug occurs has a distinct flow attributes).
> > > > >
> > > > > In many large DC fabrics, the majority of the traffic is east-west,
> > > > between end-points that aren’t anycast. In such deployments - the solution
> > > > solves  issues rather elegantly and without any interventions from the
> > > > operator.
> > > > > The issues/side effects are well understood and will be documented.
> > > > >
> > > > > The best way to receive RTGWG emails is well… to subscribe to RTGWG ;-)
> > > > >
> > > > > Cheers,
> > > > > Jeff
> > > > >
> > > > >
> > > > > On Aug 1, 2021, at 09:47, Vasilenko Eduard <vasilenko.eduard@xxxxxxxxxx>
> > > > wrote:
> > > > >
> > > > > 
> > > > > Hi  Alexander,
> > > > >
> > > > > Have I understood your presentation right?
> > > > > The client SHOULD change IPv6 flow label after SYN RTO to have a chance
> > > > to be moved to the working path inside DC fabric (if DC fabric supports
> > > > flow label for hash calculation)
> > > > > But at the same time
> > > > > The client SHOULD NOT change the IPv6 flow label after SYN RTO to avoid
> > > > being switched to a different TCP proxy engine.
> > > > >
> > > > > Looks like a deadlock, especially if both things should happen for the
> > > > same traffic:
> > > > > it should reach DC fabric
> > > > > and
> > > > > it should be hash load-balanced between different TCP proxy engines (or
> > > > applications) inside DC Fabric.
> > > > >
> > > > > I see one bad solution (“Disable Flow Label”):
> > > > > Routers up to TCP proxy engine SHOULD be configured not to use flow
> > > > label (by the way these are all routers on the Internet),
> > > > > TCP flow engines SHOULD be outside of the DC Fabric (CLOS) – probably in
> > > > front of it.
> > > > > Routers/Switches inside DC Fabric SHOULD use flow labels.
> > > > >
> > > > > I see another bad solution (“Disable Anycast”):
> > > > > Disable anycast on routers in principle, use only stateful LB.
> > > > >
> > > > >
> > > > > It has been commented in the chat that Anycast is not possible in
> > > > principle for stateful connection. It is too general a statement.
> > > > > Anycast is just not compatible with Flow Label. It is not a problem for
> > > > IPv4 anycast even if the connection is stateful (TCP) because 5-tuple for
> > > > hash would not change.
> > > > > Hence, IPv6 anycast has become dead at the time when Flow Label change
> > > > has been added in LINUX for active TCP session.
> > > > >
> > > > > Among 3 thins:
> > > > > -          Anycast
> > > > > -          Flow Label load balancing (basic Flow Label functionality)
> > > > > -          Flow Label change on the active session for application to be
> > > > more active in new path search
> > > > > You have to choose which one to kill – all 3 are not compatible with
> > > > each other at the same.
> > > > > I vote to disable Flow Label change in LINUX. Then wait till the network
> > > > would fix itself.
> > > > > We have so many fancy TE tools our days. A broken link or a broken node
> > > > could be excluded from routing for 50ms.
> > > > >
> > > > > PS: I am not subscribed to the RTGWG alias, please keep me on a copy of
> > > > this thread.
> > > > > <image001.png>
> > > > > Best Regards
> > > > > Eduard Vasilenko
> > > > > Senior Architect
> > > > > Europe Standardization & Industry Development Department
> > > > > Tel: +7(985) 910-1105, +7(916) 800-5506
> > > > >
> > > > > _______________________________________________
> > > > > rtgwg mailing list
> > > > > rtgwg@xxxxxxxx
> > > > > https://www.ietf.org/mailman/listinfo/rtgwg
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Alexander Azimov
> > > > >
> > > > > _______________________________________________
> > > > > rtgwg mailing list
> > > > > rtgwg@xxxxxxxx
> > > > > https://www.ietf.org/mailman/listinfo/rtgwg
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Alexander Azimov
> > > >
> > > >
> >
> > --
> > ---
> > tte@xxxxxxxxx
> >
> > --------------------------------------------------------------------
> > IETF IPv6 working group mailing list
> > ipv6@xxxxxxxx
> > Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> > --------------------------------------------------------------------

-- 
---
tte@xxxxxxxxx