Re: [Tsv-art] Tsvart telechat review of draft-ietf-pim-source-discovery-bsr-08

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oops, you're right. I've made these changes and posted revision 11
now, so hopefully it is ready for publication. That seems to be the
only discuss.

Thanks,
Stig


On Fri, Jan 26, 2018 at 8:07 AM, Black, David <David.Black@xxxxxxxx> wrote:
> Hi Stig,
>
> This is looking good - the technical issue is resolved, as I agree with the approach in -10, thanks!
>
> There are a couple of editorial items that need attention:
>
> [1] New text in Section 3.3:
>
>    A router MUST NOT originate more than N messages per minute.  This
>    document does not mandate how this should be implemented, but some
>    possible ways could be having a minimal time between each message,
>    counting the number of messages originated and resetting the count
>    every minute, or using a leaky bucket algorithm.  One benefit of
>    using a leaky bucket algorithm is that it can handle bursts better.
>    The default value of N is 6.  The value MUST be configurable.
>    Depending on the network one may want to use a low value allowing new
>    information to be propagated, but with a large number of routers and
>    many updates, the total number of messages might become too large and
>    require too much processing.
>
> "Depending on the network one may want to use a low value allowing new information to be propagated,"
>
> That seems wrong, as a low value of N would hit the messages per minute limit sooner.
> Would "low" -> "larger" correctly capture the intent?  If so:
>
> OLD
>    Depending on the network one may want to use a low value allowing new
>    information to be propagated, but with a large number of routers and
>    many updates, the total number of messages might become too large and
>    require too much processing.
> NEW
>    Depending on the network, one may want to use a larger value of N to favor
>    propagation of new information, but with a large number of routers and
>    many updates, the total number of messages might become too large and
>    require too much processing.
>
> [2] The first paragraph in Section 4.2 specifies the time periods for GSH TLVs; text ought to be added there that refers to the new message timing requirements in Section 3.3  (text quoted in [1] above) to ensure that GSH implementers clearly understand that those message timing requirements apply to GSH.  One can infer this applicability from the structure of the document, but I would prefer to directly tell GSH implementers that this is required.
>
> Many thanks for the productive discussion.  Also, Mirja deserves the initial credit for asking that a closer look be taken at the flooding mechanism.
>
> Thanks, --David
>
>
>> -----Original Message-----
>> From: Stig Venaas [mailto:stig@xxxxxxxxxx]
>> Sent: Thursday, January 25, 2018 6:31 PM
>> To: Black, David <david.black@xxxxxxx>
>> Cc: draft-ietf-pim-source-discovery-bsr.all@xxxxxxxx; Stewart Bryant
>> <stewart.bryant@xxxxxxxxx>; ietf@xxxxxxxx; pim@xxxxxxxx; tsv-art@xxxxxxxx
>> Subject: Re: [Tsv-art] Tsvart telechat review of draft-ietf-pim-source-
>> discovery-bsr-08
>>
>> Hi
>>
>> I just posted version 10 which I think should resolve the issues
>> raised in the tsv-art review and the discuss that was raised. The
>> change is mainly to limit how often messages can be originated. It
>> specifies a default of max 6 messages per 60 seconds and 1 second
>> between each message. It also says that the limits must be
>> configurable. Note that I first posted version 9, noticed one small
>> issue and then posted version 10.
>>
>> It's embarrassing that we completely forgot to put such limits in the
>> draft, and I'm grateful for the review allowing us to fix it before
>> publication.
>>
>> Stig
>>
>>
>> On Wed, Jan 24, 2018 at 12:08 PM, Black, David <David.Black@xxxxxxxx>
>> wrote:
>> > One change - the value MUST be configurable.  While 6 is a plausible
>> number, it results from our intelligent speculation.   If that number is wrong
>> and causes damage in a frail network, that number has to be changeable as
>> part of the experiment.  The Proposed Standard successor to this
>> forthcoming Experimental RFC would be an appropriate context for a MUST
>> vs. SHOULD discussion, IMHO.
>> >
>> > I also would specify a minimum time between packets, which also needs to
>> be configurable.  That time doesn't have to be the 10 second value from RFC
>> 5059, as this draft is doing something different, but a value is needed to
>> prevent sending 6 packets back-to-back to a router that can currently handle
>> the first 1 or 2 but will drop the rest because of everything else in the chaos
>> that it's currently dealing with.
>> >
>> > Thanks, --David
>> >
>> >
>> >> -----Original Message-----
>> >> From: Tsv-art [mailto:tsv-art-bounces@xxxxxxxx] On Behalf Of Stig Venaas
>> >> Sent: Wednesday, January 24, 2018 1:33 PM
>> >> To: Black, David <david.black@xxxxxxx>
>> >> Cc: draft-ietf-pim-source-discovery-bsr.all@xxxxxxxx; Stewart Bryant
>> >> <stewart.bryant@xxxxxxxxx>; ietf@xxxxxxxx; pim@xxxxxxxx; tsv-
>> art@xxxxxxxx
>> >> Subject: Re: [Tsv-art] Tsvart telechat review of draft-ietf-pim-source-
>> >> discovery-bsr-08
>> >>
>> >> Hi
>> >>
>> >> I agree keeping it simple is good, but I have some concerns about
>> >> requiring a minimal fixed time like 10 seconds in BSR (RFC 5059)
>> >> between each message. I would prefer something like:
>> >>
>> >> A router MUST NOT originate more than N packets per minute, note that
>> >> this does not consider packets that are being forwarded by the router.
>> >> This document does not mandate how this should be implemented, but
>> >> some possible ways could be having a minimal time between each packet,
>> >> counting the number of packets originated and resetting the count
>> >> every minute, or using a leaky bucket algorithm. One benefit of using
>> >> a leaky bucket algorithm is that it can handle bursts better. The
>> >> default value of N is 6. The value SHOULD be configurable. Depending
>> >> on the network one may want to use a low value allowing new
>> >> information to be propagated, but with a large number of routers and
>> >> many updates, the total number of messages might become too large and
>> >> requiring too much processing. The PFM mechanism can be used to
>> >> distribute many different types of information. When defining new
>> >> types, it should be considered what changes, if any, warrants sending
>> >> a triggered message.
>> >>
>> >> For the GSH (source announcement) TLV, I'll make it clear that a
>> >> triggered message is useful when a new source is detected, but one
>> >> should not trigger a message due to a source expiring (becoming
>> >> inactive).
>> >>
>> >> Thoughts?
>> >>
>> >> Stig
>> >>
>> >>
>> >> On Wed, Jan 24, 2018 at 9:40 AM, Black, David <David.Black@xxxxxxxx>
>> >> wrote:
>> >> > That works for me, Thanks, --David
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Stewart Bryant [mailto:stewart.bryant@xxxxxxxxx]
>> >> >> Sent: Wednesday, January 24, 2018 11:45 AM
>> >> >> To: Black, David <david.black@xxxxxxx>; Stig Venaas
>> >> <stig@xxxxxxxxxx>
>> >> >> Cc: tsv-art@xxxxxxxx; ietf@xxxxxxxx; pim@xxxxxxxx; draft-ietf-pim-
>> source-
>> >> >> discovery-bsr.all@xxxxxxxx
>> >> >> Subject: Re: Tsvart telechat review of draft-ietf-pim-source-discovery-
>> >> bsr-08
>> >> >>
>> >> >> The problem with complex processing under error conditions is that
>> that
>> >> >> is where all the software bugs hang out because they are hard to test
>> >> >> and don't show up until you have the problem they are trying to fix.
>> >> >>
>> >> >> This is a case where you want the simplest possible process like a small
>> >> >> burst followed by your 60s interval which seems unlikely to stress any
>> >> >> sensibly designed implementation on a reasonably sized network.
>> >> >>
>> >> >> - Stewart
>> >> >>
>> >> >>
>> >> >> On 24/01/2018 16:30, Black, David wrote:
>> >> >> > Hi Stig,
>> >> >> >
>> >> >> >> I agree with all you wrote and will update the document. However,
>> >> >> >> there is one slight issue with the minimum time between
>> origination of
>> >> >> >> each message. When a new source is detected, we would like to
>> >> >> >> originate a message ASAP so that receivers can start receiving the
>> >> >> >> multicast without much delay. A 10s delay would be a rather long
>> time
>> >> >> >> if a source was detected right after the previous message was
>> >> >> >> originated. I think some delay would be warranted though, in
>> >> >> >> particular in a case where perhaps a router starts up and a large
>> >> >> >> number of directly connected sources could be detected within a
>> short
>> >> >> >> time frame. I think an exponential back-off could make sense here.
>> >> >> >> E.g., if it is just one new source, maybe trigger a message ASAP. If a
>> >> >> >> new source is detected right after the previous one, wait a bit
>> >> >> >> longer, which also allows for aggregation of multiple sources in one
>> >> >> >> messages if several are detected later. In extreme cases one could
>> >> >> >> over time keep increasing the delay until the next update.
>> >> >> >> If sufficient we could maybe have a fixed minimum delay of 1s or
>> not,
>> >> >> >> but that is probably too short in those extreme cases. Hence maybe
>> an
>> >> >> >> exponential back-off.
>> >> >> > Exponential back-off sounds like a very good idea - I'd suggest adding
>> >> >> something starting from RFC 5059's back-off functionality.
>> >> >> >
>> >> >> >> I would appreciate some further guidance what you think is
>> reasonable
>> >> >> >> here, and perhaps whether I can borrow something here from
>> other
>> >> >> >> protocols/drafts. Part of the experiment here might be to find out
>> >> >> >> what minimum values, or how rapid back-off, is needed based on
>> the
>> >> >> >> size of the network, the amount of sources, the types of links etc.
>> >> >> > In addition to burst scenarios (e.g., router starts up, lots of new
>> sources
>> >> >> detected quickly as a result), I strongly suggest thinking about chaos
>> >> >> scenarios where links and/or routers are coming and going so rapidly
>> that
>> >> the
>> >> >> source population is in a constant state of flux.   If things are really bad,
>> >> the
>> >> >> best thing to do may be to shut up and hope that the chaos settles out,
>> as
>> >> >> not much useful will happen until it does, and send messages about
>> >> >> observed changes risks make things worse.  Again, exponential back-
>> off
>> >> >> makes sense, possibly quite aggressive, e.g., back-off from 10 seconds
>> by
>> >> a
>> >> >> small factor a few times, and if things still look bad, wait at least a
>> minute
>> >> or
>> >> >> two with further back-off from that longer time until things stabilize.
>> This
>> >> >> needs more thought on how to adjust the back-off factor, as that off-
>> the-
>> >> >> top-of my-head example probably exhibits peculiar behavior in
>> scenarios
>> >> >> that just are on the edge of tripping the long delay - some thinking
>> about
>> >> >> what stability means and how to get there may help in figuring out the
>> >> >> relative merits and applicability of backing off further vs. some kind of
>> >> >> dramatic reset, analogous to TCP's congestion window reset on
>> timeout.
>> >> >> >
>> >> >> > As this is intended to be an experimental RFC, I don’t think a
>> completely
>> >> >> worked-out solution is expected or required - a good discussion of the
>> >> >> problems and explanation of areas that need investigation as part of
>> the
>> >> >> experiment ought to suffice, as suggested in last sentence quoted
>> above.
>> >> I
>> >> >> would add some initial exponential back-off functionality as a starting
>> >> point.
>> >> >> >
>> >> >> >> Also note that the general mechanism can be used for many types
>> of
>> >> >> >> information. It depends on the information how urgent it is to
>> >> >> >> distribute it. Source discovery is particular is fairly urgent.
>> >> >> > And that should be discussed, perhaps in Section 3 somewhere.
>> >> >> >
>> >> >> > Thanks, --David
>> >> >> >
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: Stig Venaas [mailto:stig@xxxxxxxxxx]
>> >> >> >> Sent: Tuesday, January 23, 2018 7:44 PM
>> >> >> >> To: Black, David <david.black@xxxxxxx>
>> >> >> >> Cc: tsv-art@xxxxxxxx; draft-ietf-pim-source-discovery-
>> bsr.all@xxxxxxxx;
>> >> >> >> ietf@xxxxxxxx; pim@xxxxxxxx
>> >> >> >> Subject: Re: Tsvart telechat review of draft-ietf-pim-source-
>> discovery-
>> >> >> bsr-08
>> >> >> >>
>> >> >> >> Hi, thanks for the great comments.
>> >> >> >>
>> >> >> >> I agree with all you wrote and will update the document. However,
>> >> >> >> there is one slight issue with the minimum time between
>> origination of
>> >> >> >> each message. When a new source is detected, we would like to
>> >> >> >> originate a message ASAP so that receivers can start receiving the
>> >> >> >> multicast without much delay. A 10s delay would be a rather long
>> time
>> >> >> >> if a source was detected right after the previous message was
>> >> >> >> originated. I think some delay would be warranted though, in
>> >> >> >> particular in a case where perhaps a router starts up and a large
>> >> >> >> number of directly connected sources could be detected within a
>> short
>> >> >> >> time frame. I think an exponential back-off could make sense here.
>> >> >> >> E.g., if it is just one new source, maybe trigger a message ASAP. If a
>> >> >> >> new source is detected right after the previous one, wait a bit
>> >> >> >> longer, which also allows for aggregation of multiple sources in one
>> >> >> >> messages if several are detected later. In extreme cases one could
>> >> >> >> over time keep increasing the delay until the next update.
>> >> >> >> If sufficient we could maybe have a fixed minimum delay of 1s or
>> not,
>> >> >> >> but that is probably too short in those extreme cases. Hence maybe
>> an
>> >> >> >> exponential back-off.
>> >> >> >>
>> >> >> >> I would appreciate some further guidance what you think is
>> reasonable
>> >> >> >> here, and perhaps whether I can borrow something here from
>> other
>> >> >> >> protocols/drafts. Part of the experiment here might be to find out
>> >> >> >> what minimum values, or how rapid back-off, is needed based on
>> the
>> >> >> >> size of the network, the amount of sources, the types of links etc.
>> >> >> >>
>> >> >> >> Also note that the general mechanism can be used for many types
>> of
>> >> >> >> information. It depends on the information how urgent it is to
>> >> >> >> distribute it. Source discovery is particular is fairly urgent.
>> >> >> >>
>> >> >> >> Stig
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Jan 23, 2018 at 3:40 PM, David Black
>> <david.black@xxxxxxxx>
>> >> >> wrote:
>> >> >> >>> Reviewer: David Black
>> >> >> >>> Review result: Ready with Issues
>> >> >> >>>
>> >> >> >>> I've reviewed this document as part of TSV-ART's ongoing effort to
>> >> >> review key
>> >> >> >>> IETF documents. These comments were written primarily for the
>> >> >> transport area
>> >> >> >>> directors, but are copied to the document's authors for their
>> >> information
>> >> >> and
>> >> >> >>> to allow them to address any issues raised.  When done at the
>> time of
>> >> >> IETF Last
>> >> >> >>> Call, the authors should consider this review together with any
>> other
>> >> >> last-call
>> >> >> >>> comments they receive. Please always CC tsv-art@xxxxxxxx if you
>> >> reply to
>> >> >> or
>> >> >> >>> forward this review.
>> >> >> >>>
>> >> >> >>> This draft describes an experimental PFM (PIM Flooding
>> Mechanism)
>> >> >> mechanism for
>> >> >> >>> flooding PIM information among multicast routers that is a
>> >> generalized
>> >> >> form of
>> >> >> >>> the RFC 5059 PIM BSR (BootStrap Router) mechanism, and applies
>> >> this
>> >> >> mechanism
>> >> >> >>> to distribution of source group mappings (PFM-SD).
>> >> >> >>>
>> >> >> >>> Early implementation experience with PFM-SD on low bandwidth
>> >> radio
>> >> >> links
>> >> >> >>> (described Section 2) suggests that the mechanism is able to work
>> >> better
>> >> >> than
>> >> >> >>> PIM-SM without starving other traffic in the fashion that PIM-DM
>> >> may.
>> >> >> This is
>> >> >> >>> promising and (in this reviewer's opinion) justifies
>> experimentation at
>> >> >> larger
>> >> >> >>> scale and in other network environments.  In general, this is a
>> well-
>> >> >> written
>> >> >> >>> document and the authors should be commended for including
>> the
>> >> >> "running code"
>> >> >> >>> implementation experience report in Section 2.
>> >> >> >>>
>> >> >> >>> Flooding mechanisms are very useful, but the time periods that
>> >> govern
>> >> >> sending
>> >> >> >>> of flooding messages are crucial to avoid excessive consumption
>> of
>> >> >> network
>> >> >> >>> resources.  Section 5 of RFC 5059 has a solid discussion of the time
>> >> >> periods
>> >> >> >>> that apply to use of flooding by the BSR mechanism.   The
>> discussion
>> >> in
>> >> >> this
>> >> >> >>> draft is somewhat weaker, raising a couple of minor issues:
>> >> >> >>>
>> >> >> >>> 1) For PFM-SD, Section 4.2 provides a reasonable discussion of
>> time
>> >> >> periods
>> >> >> >>> that apply, but appears to be missing a minimum time period
>> >> between
>> >> >> sending
>> >> >> >>> messages.   Section 5 of RFC 5059 recommends a default of 10
>> >> seconds
>> >> >> for that
>> >> >> >>> minimum time period by comparison to a default PIM BSR sending
>> >> >> interval of 60
>> >> >> >>> seconds.  That 10 second minimum default should be added to this
>> >> draft,
>> >> >> as the
>> >> >> >>> same default sending interval of 60 seconds is used.
>> >> >> >>>
>> >> >> >>> 2) For future use of PFM for other purposes, Section 3.3 provides
>> the
>> >> >> following
>> >> >> >>> guidance:
>> >> >> >>>
>> >> >> >>>     Each TLV definition will need to define when a triggered PFM
>> >> message
>> >> >> needs
>> >> >> >>>     to be originated, and also whether to send periodic messages,
>> and
>> >> >> how
>> >> >> >>>     frequent.
>> >> >> >>>
>> >> >> >>> That guidance is correct as far as it goes, but it's not particularly
>> >> helpful
>> >> >> >>> to future protocol designers.   Text should be added to at least
>> point
>> >> to
>> >> >> the
>> >> >> >>> examples in section 4.2 of this draft and/or part of Section 5 of RFC
>> >> 5059
>> >> >> to
>> >> >> >>> suggest the sorts of values that have proven to be workable, and
>> >> >> perhaps also
>> >> >> >>> strongly encourage (SHOULD use) a default minimum time
>> between
>> >> >> messages of at
>> >> >> >>> least 10 seconds.
>> >> >> >>>
>> >> >> >>> Understanding this draft requires that the reader be familiar with
>> >> >> multicast
>> >> >> >>> and PIM, which is reasonable.  In addition, an understanding of
>> PIM
>> >> BSR
>> >> >> is also
>> >> >> >>> required, which is perhaps somewhat less reasonable.  An
>> example
>> >> that
>> >> >> this
>> >> >> >>> reviewer tripped over is that Section 3 of this draft states that
>> "Like
>> >> BSR,
>> >> >> >>> messages are forwarded hop by hop."  There is no further
>> >> explanation
>> >> >> or
>> >> >> >>> definition of "forwarded hop by hop," making it necessary to
>> consult
>> >> RFC
>> >> >> 5059
>> >> >> >>> to understand that term, e.g., this has nothing to do with IPv6
>> hop-
>> >> by-
>> >> >> hop
>> >> >> >>> options.  A sentence or two of explanation of this hop by hop
>> >> forwarding
>> >> >> >>> concept ought to be copied and adapted from RFC 5059, and it
>> would
>> >> be
>> >> >> good to
>> >> >> >>> check for other concepts that rely on RFC 5059 for definitions.
>> >> >> >>>
>> >> >> >>>
>> >> >>
>> >> >
>> >>
>> >> _______________________________________________
>> >> Tsv-art mailing list
>> >> Tsv-art@xxxxxxxx
>> >> https://www.ietf.org/mailman/listinfo/tsv-art





[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]