Hi Gyan,
I am not sure why Linux RTO refreshed the flow label initially in 2014-2016.
But the presentation on the last IETF that attracted my attention was about how to mitigate this type of vendor's bug: loss of sync between control plane and
data plane. That leads to prolonging “silent drop”.
I have stated in the 1st message: it is a problem that this work-around has become the default for the whole Internet.
I agree that it should not happen by default after any number of RTOs.
But even after activation, it makes sense to give IGP a chance to repair the problem.
IMHO: RTO timer should be configurable or at least 1s to give OSPF a chance.
Because rerouting would happen even if IGP would fix the problem very soon, not only for “hung” PFE.
Eduard
From: Gyan Mishra [mailto:hayabusagsm@xxxxxxxxx]
Sent: Tuesday, August 10, 2021 4:09 PM
To: Vasilenko Eduard <vasilenko.eduard@xxxxxxxxxx>
Cc: 6man WG <ipv6@xxxxxxxx>; IETF discussion list <ietf@xxxxxxxx>; Phillip Hallam-Baker <phill@xxxxxxxxxxxxxxx>; Theodore Ts'o <tytso@xxxxxxx>; Töma Gavrichenkov <ximaera@xxxxxxxxx>
Subject: Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?
It is probably too strong: “let’s kill the Linux hack all together”.
IMHO: it should be completely switched off by default. But if some admin would like to use it – let
it activate it and use it.
Cross all OSI model optimization looks not the good architecture decision, but if somebody wants to
do it – why not.
Gyan> Agreed
The original problem that was raised for this Linux feature (original use case):
many vendors already have so bad microcode that it is very often in the big DC environment to have broken PFE that the control plane is not aware of. “Silent drop” up to manual intervention.
Of course, it is better to monitor such a situation in a different way (iOAM, BFD), but if one has
already hundreds or thousands of switches – it is not a short-term proposition. Faster work-around is needed.
Gyan> I believe the original problem with hashing reported not in DC environment but over the internet? For the general internet scenario not rehashing at all as the Default behavior is the best solution. For the DC and other scenarios
Linux developers can change as they see fit for their environment is fine.
From: ietf [mailto:ietf-bounces@xxxxxxxx]
On Behalf Of Gyan Mishra
Sent: Tuesday, August 10, 2021 3:10 PM
To: Töma Gavrichenkov <ximaera@xxxxxxxxx>
Cc: Theodore Ts'o <tytso@xxxxxxx>; Phillip Hallam-Baker <phill@xxxxxxxxxxxxxxx>; 6man WG <ipv6@xxxxxxxx>;
IETF discussion list <ietf@xxxxxxxx>
Subject: Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?
Peace,
a patch that makes default less aggressive by restoring the original default behavior to recompute hash only after multiple RTOs.
Let's now talk about hacks, right?
A flow is basically a stream of similar data within one or more connections. This is an application layer concept. Architecturally, it may change on a connection if the data flow
within the connection changes.
E.g. we've established a connection to [youtube DNS A entry]:443, downloaded the hypertext, but now we're going to reuse the same established connection to stream video, so the
network should better treat that connection somehow differently now.
The flow label was never supposed to be a legitimate control over routing. It shouldn't change over one, two, or a hundred RTOs. It generally only changes when the flow becomes
different.
I believe this was so obvious to the authors of the original specification in 2003 that they even forgot to actually state it.
Gyan> Very Good point. So let’s say you have an IPv4 or IPv6 TCP Anycast connection you should stay on that proximity routed flow throughout the duration that goes for the long
lived TCP. But now with the Linux hack we now shift after the first RTO immediately to try a different BGP anycast path via Linux hack patch and hope for better results in case the first path was congested or having issues. This is definitely an application
based network engineering hack by a Linux developer whom had the best intentions of a application network awareness self healing network. From a technical standpoint as a TCP RST has already been receiving and we are re-establishing the connection, I am
not understanding why this was such a bad thing understandable that it’s aggressive but the thought process does makes sense. The Linux developers thought was that if you got an RTO, then more then likely that network path is bad and let’s rehash to a different
path immediately. I can see the down side is that first Anycast path from a BGP path selection was the best lowest latency path, but now the application thinks it understands the network better then network engineers and thinks it’s better to rehash to a
different path immediately. The MAJOR problem with that is as BGP Anycast is proximity based you could end up going half was around the world for the second best path and now voila —> TCP Anycast is now from the Happy Eyeballs (not the RFC 6555) but user
perspective is completely broken thus the subject heading “IPv6 Anycast has been killed by Linux patch”.
What Tom proposed is, of course, way better than how it works now. Especially the socket option — yay, Linux is finally going to implement the "MUST" in RFC3697#3! We harbour
the hope that other operating systems would do the same good thing.
Gyan> Given what I stated above I would say let the Network do the networking and as CDN makes up 90% plus of the internet traffic being GEO load balanced worldwide, and as we have
IETF ALTO WG that does application based traffic optimization BGP-LS / PCEP CDN RSVP / SR aware network optimization based solutions that already exist today, let’s kill the Linux hack all together. As the Linux server is completely unaware of network
conditions any rehash is bad thing as that breaks TCP Anycast by sending you clear around the world when you should be “sticky” based on BGP Anycast best path selection stay on the optimization proximity based network path and only shift to alternate BGP path
when the path is no longer available. Let routing do it’s routing!!
But the idea I'm trying to drive home is: fixing (temporary) network delivery issues via the control of a strictly application level feature is among the dirtiest of the hacks possible.
And it kind of amazes me how people call anycast a hack (while it's perfectly the behaviour natural to the Internet, a global self-healing internetwork, as designed in 1970s) and
still consider this a legitimate behaviour.
Gyan> After reading the feedback from Toma can we not rehash at all for the Default Linux patch. See the MAJOR problem that is being created when you try to rehash with BGP Anycast
described above and that basically any rehash literally breaks IPv6 flow label based TCP Anycast CDN load balancing.
--
--
|