Oops, you're right. I've made these changes and posted revision 11 now, so hopefully it is ready for publication. That seems to be the only discuss. Thanks, Stig On Fri, Jan 26, 2018 at 8:07 AM, Black, David <David.Black@xxxxxxxx> wrote: > Hi Stig, > > This is looking good - the technical issue is resolved, as I agree with the approach in -10, thanks! > > There are a couple of editorial items that need attention: > > [1] New text in Section 3.3: > > A router MUST NOT originate more than N messages per minute. This > document does not mandate how this should be implemented, but some > possible ways could be having a minimal time between each message, > counting the number of messages originated and resetting the count > every minute, or using a leaky bucket algorithm. One benefit of > using a leaky bucket algorithm is that it can handle bursts better. > The default value of N is 6. The value MUST be configurable. > Depending on the network one may want to use a low value allowing new > information to be propagated, but with a large number of routers and > many updates, the total number of messages might become too large and > require too much processing. > > "Depending on the network one may want to use a low value allowing new information to be propagated," > > That seems wrong, as a low value of N would hit the messages per minute limit sooner. > Would "low" -> "larger" correctly capture the intent? If so: > > OLD > Depending on the network one may want to use a low value allowing new > information to be propagated, but with a large number of routers and > many updates, the total number of messages might become too large and > require too much processing. > NEW > Depending on the network, one may want to use a larger value of N to favor > propagation of new information, but with a large number of routers and > many updates, the total number of messages might become too large and > require too much processing. > > [2] The first paragraph in Section 4.2 specifies the time periods for GSH TLVs; text ought to be added there that refers to the new message timing requirements in Section 3.3 (text quoted in [1] above) to ensure that GSH implementers clearly understand that those message timing requirements apply to GSH. One can infer this applicability from the structure of the document, but I would prefer to directly tell GSH implementers that this is required. > > Many thanks for the productive discussion. Also, Mirja deserves the initial credit for asking that a closer look be taken at the flooding mechanism. > > Thanks, --David > > >> -----Original Message----- >> From: Stig Venaas [mailto:stig@xxxxxxxxxx] >> Sent: Thursday, January 25, 2018 6:31 PM >> To: Black, David <david.black@xxxxxxx> >> Cc: draft-ietf-pim-source-discovery-bsr.all@xxxxxxxx; Stewart Bryant >> <stewart.bryant@xxxxxxxxx>; ietf@xxxxxxxx; pim@xxxxxxxx; tsv-art@xxxxxxxx >> Subject: Re: [Tsv-art] Tsvart telechat review of draft-ietf-pim-source- >> discovery-bsr-08 >> >> Hi >> >> I just posted version 10 which I think should resolve the issues >> raised in the tsv-art review and the discuss that was raised. The >> change is mainly to limit how often messages can be originated. It >> specifies a default of max 6 messages per 60 seconds and 1 second >> between each message. It also says that the limits must be >> configurable. Note that I first posted version 9, noticed one small >> issue and then posted version 10. >> >> It's embarrassing that we completely forgot to put such limits in the >> draft, and I'm grateful for the review allowing us to fix it before >> publication. >> >> Stig >> >> >> On Wed, Jan 24, 2018 at 12:08 PM, Black, David <David.Black@xxxxxxxx> >> wrote: >> > One change - the value MUST be configurable. While 6 is a plausible >> number, it results from our intelligent speculation. If that number is wrong >> and causes damage in a frail network, that number has to be changeable as >> part of the experiment. The Proposed Standard successor to this >> forthcoming Experimental RFC would be an appropriate context for a MUST >> vs. SHOULD discussion, IMHO. >> > >> > I also would specify a minimum time between packets, which also needs to >> be configurable. That time doesn't have to be the 10 second value from RFC >> 5059, as this draft is doing something different, but a value is needed to >> prevent sending 6 packets back-to-back to a router that can currently handle >> the first 1 or 2 but will drop the rest because of everything else in the chaos >> that it's currently dealing with. >> > >> > Thanks, --David >> > >> > >> >> -----Original Message----- >> >> From: Tsv-art [mailto:tsv-art-bounces@xxxxxxxx] On Behalf Of Stig Venaas >> >> Sent: Wednesday, January 24, 2018 1:33 PM >> >> To: Black, David <david.black@xxxxxxx> >> >> Cc: draft-ietf-pim-source-discovery-bsr.all@xxxxxxxx; Stewart Bryant >> >> <stewart.bryant@xxxxxxxxx>; ietf@xxxxxxxx; pim@xxxxxxxx; tsv- >> art@xxxxxxxx >> >> Subject: Re: [Tsv-art] Tsvart telechat review of draft-ietf-pim-source- >> >> discovery-bsr-08 >> >> >> >> Hi >> >> >> >> I agree keeping it simple is good, but I have some concerns about >> >> requiring a minimal fixed time like 10 seconds in BSR (RFC 5059) >> >> between each message. I would prefer something like: >> >> >> >> A router MUST NOT originate more than N packets per minute, note that >> >> this does not consider packets that are being forwarded by the router. >> >> This document does not mandate how this should be implemented, but >> >> some possible ways could be having a minimal time between each packet, >> >> counting the number of packets originated and resetting the count >> >> every minute, or using a leaky bucket algorithm. One benefit of using >> >> a leaky bucket algorithm is that it can handle bursts better. The >> >> default value of N is 6. The value SHOULD be configurable. Depending >> >> on the network one may want to use a low value allowing new >> >> information to be propagated, but with a large number of routers and >> >> many updates, the total number of messages might become too large and >> >> requiring too much processing. The PFM mechanism can be used to >> >> distribute many different types of information. When defining new >> >> types, it should be considered what changes, if any, warrants sending >> >> a triggered message. >> >> >> >> For the GSH (source announcement) TLV, I'll make it clear that a >> >> triggered message is useful when a new source is detected, but one >> >> should not trigger a message due to a source expiring (becoming >> >> inactive). >> >> >> >> Thoughts? >> >> >> >> Stig >> >> >> >> >> >> On Wed, Jan 24, 2018 at 9:40 AM, Black, David <David.Black@xxxxxxxx> >> >> wrote: >> >> > That works for me, Thanks, --David >> >> > >> >> > >> >> >> -----Original Message----- >> >> >> From: Stewart Bryant [mailto:stewart.bryant@xxxxxxxxx] >> >> >> Sent: Wednesday, January 24, 2018 11:45 AM >> >> >> To: Black, David <david.black@xxxxxxx>; Stig Venaas >> >> <stig@xxxxxxxxxx> >> >> >> Cc: tsv-art@xxxxxxxx; ietf@xxxxxxxx; pim@xxxxxxxx; draft-ietf-pim- >> source- >> >> >> discovery-bsr.all@xxxxxxxx >> >> >> Subject: Re: Tsvart telechat review of draft-ietf-pim-source-discovery- >> >> bsr-08 >> >> >> >> >> >> The problem with complex processing under error conditions is that >> that >> >> >> is where all the software bugs hang out because they are hard to test >> >> >> and don't show up until you have the problem they are trying to fix. >> >> >> >> >> >> This is a case where you want the simplest possible process like a small >> >> >> burst followed by your 60s interval which seems unlikely to stress any >> >> >> sensibly designed implementation on a reasonably sized network. >> >> >> >> >> >> - Stewart >> >> >> >> >> >> >> >> >> On 24/01/2018 16:30, Black, David wrote: >> >> >> > Hi Stig, >> >> >> > >> >> >> >> I agree with all you wrote and will update the document. However, >> >> >> >> there is one slight issue with the minimum time between >> origination of >> >> >> >> each message. When a new source is detected, we would like to >> >> >> >> originate a message ASAP so that receivers can start receiving the >> >> >> >> multicast without much delay. A 10s delay would be a rather long >> time >> >> >> >> if a source was detected right after the previous message was >> >> >> >> originated. I think some delay would be warranted though, in >> >> >> >> particular in a case where perhaps a router starts up and a large >> >> >> >> number of directly connected sources could be detected within a >> short >> >> >> >> time frame. I think an exponential back-off could make sense here. >> >> >> >> E.g., if it is just one new source, maybe trigger a message ASAP. If a >> >> >> >> new source is detected right after the previous one, wait a bit >> >> >> >> longer, which also allows for aggregation of multiple sources in one >> >> >> >> messages if several are detected later. In extreme cases one could >> >> >> >> over time keep increasing the delay until the next update. >> >> >> >> If sufficient we could maybe have a fixed minimum delay of 1s or >> not, >> >> >> >> but that is probably too short in those extreme cases. Hence maybe >> an >> >> >> >> exponential back-off. >> >> >> > Exponential back-off sounds like a very good idea - I'd suggest adding >> >> >> something starting from RFC 5059's back-off functionality. >> >> >> > >> >> >> >> I would appreciate some further guidance what you think is >> reasonable >> >> >> >> here, and perhaps whether I can borrow something here from >> other >> >> >> >> protocols/drafts. Part of the experiment here might be to find out >> >> >> >> what minimum values, or how rapid back-off, is needed based on >> the >> >> >> >> size of the network, the amount of sources, the types of links etc. >> >> >> > In addition to burst scenarios (e.g., router starts up, lots of new >> sources >> >> >> detected quickly as a result), I strongly suggest thinking about chaos >> >> >> scenarios where links and/or routers are coming and going so rapidly >> that >> >> the >> >> >> source population is in a constant state of flux. If things are really bad, >> >> the >> >> >> best thing to do may be to shut up and hope that the chaos settles out, >> as >> >> >> not much useful will happen until it does, and send messages about >> >> >> observed changes risks make things worse. Again, exponential back- >> off >> >> >> makes sense, possibly quite aggressive, e.g., back-off from 10 seconds >> by >> >> a >> >> >> small factor a few times, and if things still look bad, wait at least a >> minute >> >> or >> >> >> two with further back-off from that longer time until things stabilize. >> This >> >> >> needs more thought on how to adjust the back-off factor, as that off- >> the- >> >> >> top-of my-head example probably exhibits peculiar behavior in >> scenarios >> >> >> that just are on the edge of tripping the long delay - some thinking >> about >> >> >> what stability means and how to get there may help in figuring out the >> >> >> relative merits and applicability of backing off further vs. some kind of >> >> >> dramatic reset, analogous to TCP's congestion window reset on >> timeout. >> >> >> > >> >> >> > As this is intended to be an experimental RFC, I don’t think a >> completely >> >> >> worked-out solution is expected or required - a good discussion of the >> >> >> problems and explanation of areas that need investigation as part of >> the >> >> >> experiment ought to suffice, as suggested in last sentence quoted >> above. >> >> I >> >> >> would add some initial exponential back-off functionality as a starting >> >> point. >> >> >> > >> >> >> >> Also note that the general mechanism can be used for many types >> of >> >> >> >> information. It depends on the information how urgent it is to >> >> >> >> distribute it. Source discovery is particular is fairly urgent. >> >> >> > And that should be discussed, perhaps in Section 3 somewhere. >> >> >> > >> >> >> > Thanks, --David >> >> >> > >> >> >> > >> >> >> >> -----Original Message----- >> >> >> >> From: Stig Venaas [mailto:stig@xxxxxxxxxx] >> >> >> >> Sent: Tuesday, January 23, 2018 7:44 PM >> >> >> >> To: Black, David <david.black@xxxxxxx> >> >> >> >> Cc: tsv-art@xxxxxxxx; draft-ietf-pim-source-discovery- >> bsr.all@xxxxxxxx; >> >> >> >> ietf@xxxxxxxx; pim@xxxxxxxx >> >> >> >> Subject: Re: Tsvart telechat review of draft-ietf-pim-source- >> discovery- >> >> >> bsr-08 >> >> >> >> >> >> >> >> Hi, thanks for the great comments. >> >> >> >> >> >> >> >> I agree with all you wrote and will update the document. However, >> >> >> >> there is one slight issue with the minimum time between >> origination of >> >> >> >> each message. When a new source is detected, we would like to >> >> >> >> originate a message ASAP so that receivers can start receiving the >> >> >> >> multicast without much delay. A 10s delay would be a rather long >> time >> >> >> >> if a source was detected right after the previous message was >> >> >> >> originated. I think some delay would be warranted though, in >> >> >> >> particular in a case where perhaps a router starts up and a large >> >> >> >> number of directly connected sources could be detected within a >> short >> >> >> >> time frame. I think an exponential back-off could make sense here. >> >> >> >> E.g., if it is just one new source, maybe trigger a message ASAP. If a >> >> >> >> new source is detected right after the previous one, wait a bit >> >> >> >> longer, which also allows for aggregation of multiple sources in one >> >> >> >> messages if several are detected later. In extreme cases one could >> >> >> >> over time keep increasing the delay until the next update. >> >> >> >> If sufficient we could maybe have a fixed minimum delay of 1s or >> not, >> >> >> >> but that is probably too short in those extreme cases. Hence maybe >> an >> >> >> >> exponential back-off. >> >> >> >> >> >> >> >> I would appreciate some further guidance what you think is >> reasonable >> >> >> >> here, and perhaps whether I can borrow something here from >> other >> >> >> >> protocols/drafts. Part of the experiment here might be to find out >> >> >> >> what minimum values, or how rapid back-off, is needed based on >> the >> >> >> >> size of the network, the amount of sources, the types of links etc. >> >> >> >> >> >> >> >> Also note that the general mechanism can be used for many types >> of >> >> >> >> information. It depends on the information how urgent it is to >> >> >> >> distribute it. Source discovery is particular is fairly urgent. >> >> >> >> >> >> >> >> Stig >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Jan 23, 2018 at 3:40 PM, David Black >> <david.black@xxxxxxxx> >> >> >> wrote: >> >> >> >>> Reviewer: David Black >> >> >> >>> Review result: Ready with Issues >> >> >> >>> >> >> >> >>> I've reviewed this document as part of TSV-ART's ongoing effort to >> >> >> review key >> >> >> >>> IETF documents. These comments were written primarily for the >> >> >> transport area >> >> >> >>> directors, but are copied to the document's authors for their >> >> information >> >> >> and >> >> >> >>> to allow them to address any issues raised. When done at the >> time of >> >> >> IETF Last >> >> >> >>> Call, the authors should consider this review together with any >> other >> >> >> last-call >> >> >> >>> comments they receive. Please always CC tsv-art@xxxxxxxx if you >> >> reply to >> >> >> or >> >> >> >>> forward this review. >> >> >> >>> >> >> >> >>> This draft describes an experimental PFM (PIM Flooding >> Mechanism) >> >> >> mechanism for >> >> >> >>> flooding PIM information among multicast routers that is a >> >> generalized >> >> >> form of >> >> >> >>> the RFC 5059 PIM BSR (BootStrap Router) mechanism, and applies >> >> this >> >> >> mechanism >> >> >> >>> to distribution of source group mappings (PFM-SD). >> >> >> >>> >> >> >> >>> Early implementation experience with PFM-SD on low bandwidth >> >> radio >> >> >> links >> >> >> >>> (described Section 2) suggests that the mechanism is able to work >> >> better >> >> >> than >> >> >> >>> PIM-SM without starving other traffic in the fashion that PIM-DM >> >> may. >> >> >> This is >> >> >> >>> promising and (in this reviewer's opinion) justifies >> experimentation at >> >> >> larger >> >> >> >>> scale and in other network environments. In general, this is a >> well- >> >> >> written >> >> >> >>> document and the authors should be commended for including >> the >> >> >> "running code" >> >> >> >>> implementation experience report in Section 2. >> >> >> >>> >> >> >> >>> Flooding mechanisms are very useful, but the time periods that >> >> govern >> >> >> sending >> >> >> >>> of flooding messages are crucial to avoid excessive consumption >> of >> >> >> network >> >> >> >>> resources. Section 5 of RFC 5059 has a solid discussion of the time >> >> >> periods >> >> >> >>> that apply to use of flooding by the BSR mechanism. The >> discussion >> >> in >> >> >> this >> >> >> >>> draft is somewhat weaker, raising a couple of minor issues: >> >> >> >>> >> >> >> >>> 1) For PFM-SD, Section 4.2 provides a reasonable discussion of >> time >> >> >> periods >> >> >> >>> that apply, but appears to be missing a minimum time period >> >> between >> >> >> sending >> >> >> >>> messages. Section 5 of RFC 5059 recommends a default of 10 >> >> seconds >> >> >> for that >> >> >> >>> minimum time period by comparison to a default PIM BSR sending >> >> >> interval of 60 >> >> >> >>> seconds. That 10 second minimum default should be added to this >> >> draft, >> >> >> as the >> >> >> >>> same default sending interval of 60 seconds is used. >> >> >> >>> >> >> >> >>> 2) For future use of PFM for other purposes, Section 3.3 provides >> the >> >> >> following >> >> >> >>> guidance: >> >> >> >>> >> >> >> >>> Each TLV definition will need to define when a triggered PFM >> >> message >> >> >> needs >> >> >> >>> to be originated, and also whether to send periodic messages, >> and >> >> >> how >> >> >> >>> frequent. >> >> >> >>> >> >> >> >>> That guidance is correct as far as it goes, but it's not particularly >> >> helpful >> >> >> >>> to future protocol designers. Text should be added to at least >> point >> >> to >> >> >> the >> >> >> >>> examples in section 4.2 of this draft and/or part of Section 5 of RFC >> >> 5059 >> >> >> to >> >> >> >>> suggest the sorts of values that have proven to be workable, and >> >> >> perhaps also >> >> >> >>> strongly encourage (SHOULD use) a default minimum time >> between >> >> >> messages of at >> >> >> >>> least 10 seconds. >> >> >> >>> >> >> >> >>> Understanding this draft requires that the reader be familiar with >> >> >> multicast >> >> >> >>> and PIM, which is reasonable. In addition, an understanding of >> PIM >> >> BSR >> >> >> is also >> >> >> >>> required, which is perhaps somewhat less reasonable. An >> example >> >> that >> >> >> this >> >> >> >>> reviewer tripped over is that Section 3 of this draft states that >> "Like >> >> BSR, >> >> >> >>> messages are forwarded hop by hop." There is no further >> >> explanation >> >> >> or >> >> >> >>> definition of "forwarded hop by hop," making it necessary to >> consult >> >> RFC >> >> >> 5059 >> >> >> >>> to understand that term, e.g., this has nothing to do with IPv6 >> hop- >> >> by- >> >> >> hop >> >> >> >>> options. A sentence or two of explanation of this hop by hop >> >> forwarding >> >> >> >>> concept ought to be copied and adapted from RFC 5059, and it >> would >> >> be >> >> >> good to >> >> >> >>> check for other concepts that rely on RFC 5059 for definitions. >> >> >> >>> >> >> >> >>> >> >> >> >> >> > >> >> >> >> _______________________________________________ >> >> Tsv-art mailing list >> >> Tsv-art@xxxxxxxx >> >> https://www.ietf.org/mailman/listinfo/tsv-art