Hi, Jürgen,
Thanks a lot for your comments! In-line....
On 9/9/20 19:05, Jürgen Schönwälder via Datatracker wrote:
[....]
Perhaps indicate a bit earlier what unacceptably long means, i.e. we
are talking about days and weeks.
This is a bit subjective. If I'm sitting on my computer doing e.g.
video-conferencing (i.e., anything interactive), probably anything over
a few minutes would be unacceptable. In a more general case, what's
acceptable is a function of how often the problem happens and whether
there's any ongoing interactive usage -- and that's still subjective.
The scenarios described read a bit
like somewhat rare events and hence it is useful for the reader to
have an idea what unacceptably long means in such events.
I wondering if adding something like:
" Any definition of what is considered 'acceptable' here would be
subjective, and would probably also depend on how often these
flash-renumbering events occur, whether the affected hosts are employing
any interactive applications, and other parameters. However, one rough
estimate would be that hosts should be able to deal with
flash-renumbering events with a similar timeliness with which they can
deal with failing default routers."
would help?
(BTW, I find
the scenario not described at the beginning where a router announces
SLAAC lifetimes that are not synchronized with obtained prefix
lifetimes operationally the more tricky problem since this can lead to
regular failures.)
Fair enough. How about adding this to the bulleted-list:
" o A router (e.g. Customer Edge router) may advertise autoconfiguration
prefixes corresponding to prefixes learned via DHCPv6-PD with constant
PIO lifetimes that are not synchronized with the DHCPv6-PD lease time
(as required in Section 6.3 of [RFC8415]). While this behavior violates
the aforementioned requirement from [RFC8415], it is not an unusual
behavior, particularly when e.g. DHCPv6-PD is implemented in a different
software module than the SLAAC router component.".
?
Section 2.2 seems to confuse soft-state (this is what a learned IPv6
prefix is for me) with certain protocol timers. There are many places
where protocols use soft-state and implementations use timers to purge
or refresh soft-state. That timers generally do not go off in normal
conditions is not really correct in this context, DHCP leases are
renewed when their lifetime expires, a normal operation.
Normally, you renew the lease before the lease expires.
IP address
mappings to Ethernet addresses expire when their lifetime timer goes
off.
This one is not the necessarily the best example ;-) (while RFC1122
requires that, IIRC in many implementations the entry is refreshed when
referenced, and it only expires when not referenced/refreshed frequently
enough).
But I do see where you are going and I realize that the text is a bit
sloppy in this respect. How about tweaking the text as follows:
---- cut here ----
Many protocols, from different layers, normally employ timers for
fault isolation/recovery. The
general logic is as follows:
o A timer is set with a value such that, under normal conditions,
the timer does *not* go off.
o Whenever a fault condition arises, the timer goes off, and the
protocol can perform fault recovery
For example, when implementing reliability mechanisms, a timer is
normally set when a packet is transmitted and, unless a response is
received before the timer goes off, a fault recovery action (such as
packet re-transmission) is triggered.
---- cut here ----
?
One might also look at this same issue as the timer implying a sensible
period of time where information should be refreshed, as you correctly
point out, though.
(I guess the only difference is that when looking at this form the
soft-state angle, you're mostly considering the case where information
changes, whereas when looking at this from the fault-recovery pov,
you're mostly thinking about failures, rather than updates).
Switches purge forwarding state regularly when forwarding entries
expire. Cached DNS name to IP resolutions expire. The only problem
here seems to be that a lifetime of 7 days / 30 days is a bit
ridiculous.
Agreed.
Is anyone shipping the RFC 4861 defaults?
Yes, unfortunately. Some implementations override the RFC4861 defaults.
Still, RFC4861 defaults are extremely common and widespread.
The few
implementations I have seen do use a bit more reasonable defaults. I
think this section should be rewritten to replace the "timer going off
is associated with a failure" text with a discussion of soft-state in
other protocols. (Section 2.2 is why I ticked 'has issues'.)
As a second alternative to what I've suggested above:
---- cut here ----
Many protocols, from different layers, normally employ timers for a
variety of purposes, such as in fault isolation/recovery mechanisms,
and in the maintenance of data structures that contain bindings of
some sort (e.g., the IPv6 Neighbor Cache [RFC4861]).
In the case of fault recovery/isolation, the general logic is as
follows:
o A timer is set with a value such that, under normal conditions,
the timer does *not* go off.
o Whenever a fault condition arises, the timer goes off, and the
protocol can perform fault recovery
For example, when implementing reliability mechanisms, a timer is
normally set when a packet is transmitted and, unless a response is
received before the timer goes off, a fault recovery action (such as
packet re-transmission) is triggered.
On the other hand, when maintaining bindings in data structures,
timers are usually selected in a way that any bindings that become stale
are updated in a timely manner.
---- cut here ----
?
Isn't a part of the solution (other than moving to less ridiculous
default) that SLAAC hosts experiencing connectivity problems should
try to validate the prefix that they have learned (and if the
validation fails move to a newly learned prefix)?
Yes, indeed. That's what we are pursuing in draft-ietf-6man-slaac-renum.
(see Section 4 of this (draft-ietf-v6ops-slaac-renum-03) document).
draft-ietf-v6ops-slaac-renum-03 contains the problem statement and
*operational* mitigations only.
Involving the hosts
in a resolution of the problem may be more robust than expecting that
something in the network takes care of invalidating stale soft-state.
I agree 100%. That is and has been, indeed, the motivation for pursuing
draft-ietf-6man-slaac-renum.
Thanks!
Regards,
--
Fernando Gont
SI6 Networks
e-mail: fgont@xxxxxxxxxxxxxxx
PGP Fingerprint: 6666 31C6 D484 63B2 8FB1 E3C4 AE25 0D55 1D4E 7492
--
last-call mailing list
last-call@xxxxxxxx
https://www.ietf.org/mailman/listinfo/last-call