RE: Tsvart telechat review of draft-ietf-pim-source-discovery-bsr-08

"Black, David" <David.Black@xxxxxxxx> · Wed, 24 Jan 2018 17:40:22 +0000



That works for me, Thanks, --David


> -----Original Message-----
> From: Stewart Bryant [mailto:stewart.bryant@xxxxxxxxx]
> Sent: Wednesday, January 24, 2018 11:45 AM
> To: Black, David <david.black@xxxxxxx>; Stig Venaas <stig@xxxxxxxxxx>
> Cc: tsv-art@xxxxxxxx; ietf@xxxxxxxx; pim@xxxxxxxx; draft-ietf-pim-source-
> discovery-bsr.all@xxxxxxxx
> Subject: Re: Tsvart telechat review of draft-ietf-pim-source-discovery-bsr-08
> 
> The problem with complex processing under error conditions is that that
> is where all the software bugs hang out because they are hard to test
> and don't show up until you have the problem they are trying to fix.
> 
> This is a case where you want the simplest possible process like a small
> burst followed by your 60s interval which seems unlikely to stress any
> sensibly designed implementation on a reasonably sized network.
> 
> - Stewart
> 
> 
> On 24/01/2018 16:30, Black, David wrote:
> > Hi Stig,
> >
> >> I agree with all you wrote and will update the document. However,
> >> there is one slight issue with the minimum time between origination of
> >> each message. When a new source is detected, we would like to
> >> originate a message ASAP so that receivers can start receiving the
> >> multicast without much delay. A 10s delay would be a rather long time
> >> if a source was detected right after the previous message was
> >> originated. I think some delay would be warranted though, in
> >> particular in a case where perhaps a router starts up and a large
> >> number of directly connected sources could be detected within a short
> >> time frame. I think an exponential back-off could make sense here.
> >> E.g., if it is just one new source, maybe trigger a message ASAP. If a
> >> new source is detected right after the previous one, wait a bit
> >> longer, which also allows for aggregation of multiple sources in one
> >> messages if several are detected later. In extreme cases one could
> >> over time keep increasing the delay until the next update.
> >> If sufficient we could maybe have a fixed minimum delay of 1s or not,
> >> but that is probably too short in those extreme cases. Hence maybe an
> >> exponential back-off.
> > Exponential back-off sounds like a very good idea - I'd suggest adding
> something starting from RFC 5059's back-off functionality.
> >
> >> I would appreciate some further guidance what you think is reasonable
> >> here, and perhaps whether I can borrow something here from other
> >> protocols/drafts. Part of the experiment here might be to find out
> >> what minimum values, or how rapid back-off, is needed based on the
> >> size of the network, the amount of sources, the types of links etc.
> > In addition to burst scenarios (e.g., router starts up, lots of new sources
> detected quickly as a result), I strongly suggest thinking about chaos
> scenarios where links and/or routers are coming and going so rapidly that the
> source population is in a constant state of flux.   If things are really bad, the
> best thing to do may be to shut up and hope that the chaos settles out, as
> not much useful will happen until it does, and send messages about
> observed changes risks make things worse.  Again, exponential back-off
> makes sense, possibly quite aggressive, e.g., back-off from 10 seconds by a
> small factor a few times, and if things still look bad, wait at least a minute or
> two with further back-off from that longer time until things stabilize.  This
> needs more thought on how to adjust the back-off factor, as that off-the-
> top-of my-head example probably exhibits peculiar behavior in scenarios
> that just are on the edge of tripping the long delay - some thinking about
> what stability means and how to get there may help in figuring out the
> relative merits and applicability of backing off further vs. some kind of
> dramatic reset, analogous to TCP's congestion window reset on timeout.
> >
> > As this is intended to be an experimental RFC, I don’t think a completely
> worked-out solution is expected or required - a good discussion of the
> problems and explanation of areas that need investigation as part of the
> experiment ought to suffice, as suggested in last sentence quoted above.  I
> would add some initial exponential back-off functionality as a starting point.
> >
> >> Also note that the general mechanism can be used for many types of
> >> information. It depends on the information how urgent it is to
> >> distribute it. Source discovery is particular is fairly urgent.
> > And that should be discussed, perhaps in Section 3 somewhere.
> >
> > Thanks, --David
> >
> >
> >> -----Original Message-----
> >> From: Stig Venaas [mailto:stig@xxxxxxxxxx]
> >> Sent: Tuesday, January 23, 2018 7:44 PM
> >> To: Black, David <david.black@xxxxxxx>
> >> Cc: tsv-art@xxxxxxxx; draft-ietf-pim-source-discovery-bsr.all@xxxxxxxx;
> >> ietf@xxxxxxxx; pim@xxxxxxxx
> >> Subject: Re: Tsvart telechat review of draft-ietf-pim-source-discovery-
> bsr-08
> >>
> >> Hi, thanks for the great comments.
> >>
> >> I agree with all you wrote and will update the document. However,
> >> there is one slight issue with the minimum time between origination of
> >> each message. When a new source is detected, we would like to
> >> originate a message ASAP so that receivers can start receiving the
> >> multicast without much delay. A 10s delay would be a rather long time
> >> if a source was detected right after the previous message was
> >> originated. I think some delay would be warranted though, in
> >> particular in a case where perhaps a router starts up and a large
> >> number of directly connected sources could be detected within a short
> >> time frame. I think an exponential back-off could make sense here.
> >> E.g., if it is just one new source, maybe trigger a message ASAP. If a
> >> new source is detected right after the previous one, wait a bit
> >> longer, which also allows for aggregation of multiple sources in one
> >> messages if several are detected later. In extreme cases one could
> >> over time keep increasing the delay until the next update.
> >> If sufficient we could maybe have a fixed minimum delay of 1s or not,
> >> but that is probably too short in those extreme cases. Hence maybe an
> >> exponential back-off.
> >>
> >> I would appreciate some further guidance what you think is reasonable
> >> here, and perhaps whether I can borrow something here from other
> >> protocols/drafts. Part of the experiment here might be to find out
> >> what minimum values, or how rapid back-off, is needed based on the
> >> size of the network, the amount of sources, the types of links etc.
> >>
> >> Also note that the general mechanism can be used for many types of
> >> information. It depends on the information how urgent it is to
> >> distribute it. Source discovery is particular is fairly urgent.
> >>
> >> Stig
> >>
> >>
> >> On Tue, Jan 23, 2018 at 3:40 PM, David Black <david.black@xxxxxxxx>
> wrote:
> >>> Reviewer: David Black
> >>> Review result: Ready with Issues
> >>>
> >>> I've reviewed this document as part of TSV-ART's ongoing effort to
> review key
> >>> IETF documents. These comments were written primarily for the
> transport area
> >>> directors, but are copied to the document's authors for their information
> and
> >>> to allow them to address any issues raised.  When done at the time of
> IETF Last
> >>> Call, the authors should consider this review together with any other
> last-call
> >>> comments they receive. Please always CC tsv-art@xxxxxxxx if you reply to
> or
> >>> forward this review.
> >>>
> >>> This draft describes an experimental PFM (PIM Flooding Mechanism)
> mechanism for
> >>> flooding PIM information among multicast routers that is a generalized
> form of
> >>> the RFC 5059 PIM BSR (BootStrap Router) mechanism, and applies this
> mechanism
> >>> to distribution of source group mappings (PFM-SD).
> >>>
> >>> Early implementation experience with PFM-SD on low bandwidth radio
> links
> >>> (described Section 2) suggests that the mechanism is able to work better
> than
> >>> PIM-SM without starving other traffic in the fashion that PIM-DM may.
> This is
> >>> promising and (in this reviewer's opinion) justifies experimentation at
> larger
> >>> scale and in other network environments.  In general, this is a well-
> written
> >>> document and the authors should be commended for including the
> "running code"
> >>> implementation experience report in Section 2.
> >>>
> >>> Flooding mechanisms are very useful, but the time periods that govern
> sending
> >>> of flooding messages are crucial to avoid excessive consumption of
> network
> >>> resources.  Section 5 of RFC 5059 has a solid discussion of the time
> periods
> >>> that apply to use of flooding by the BSR mechanism.   The discussion in
> this
> >>> draft is somewhat weaker, raising a couple of minor issues:
> >>>
> >>> 1) For PFM-SD, Section 4.2 provides a reasonable discussion of time
> periods
> >>> that apply, but appears to be missing a minimum time period between
> sending
> >>> messages.   Section 5 of RFC 5059 recommends a default of 10 seconds
> for that
> >>> minimum time period by comparison to a default PIM BSR sending
> interval of 60
> >>> seconds.  That 10 second minimum default should be added to this draft,
> as the
> >>> same default sending interval of 60 seconds is used.
> >>>
> >>> 2) For future use of PFM for other purposes, Section 3.3 provides the
> following
> >>> guidance:
> >>>
> >>>     Each TLV definition will need to define when a triggered PFM message
> needs
> >>>     to be originated, and also whether to send periodic messages, and
> how
> >>>     frequent.
> >>>
> >>> That guidance is correct as far as it goes, but it's not particularly helpful
> >>> to future protocol designers.   Text should be added to at least point to
> the
> >>> examples in section 4.2 of this draft and/or part of Section 5 of RFC 5059
> to
> >>> suggest the sorts of values that have proven to be workable, and
> perhaps also
> >>> strongly encourage (SHOULD use) a default minimum time between
> messages of at
> >>> least 10 seconds.
> >>>
> >>> Understanding this draft requires that the reader be familiar with
> multicast
> >>> and PIM, which is reasonable.  In addition, an understanding of PIM BSR
> is also
> >>> required, which is perhaps somewhat less reasonable.  An example that
> this
> >>> reviewer tripped over is that Section 3 of this draft states that "Like BSR,
> >>> messages are forwarded hop by hop."  There is no further explanation
> or
> >>> definition of "forwarded hop by hop," making it necessary to consult RFC
> 5059
> >>> to understand that term, e.g., this has nothing to do with IPv6 hop-by-
> hop
> >>> options.  A sentence or two of explanation of this hop by hop forwarding
> >>> concept ought to be copied and adapted from RFC 5059, and it would be
> good to
> >>> check for other concepts that rely on RFC 5059 for definitions.
> >>>
> >>>
>