RE: Gen-art LC review: draft-mm-netconf-time-capability-05

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Robert,

Thanks again for the prompt responses.


>Well, those are just a subset of the things that could change in command's
>context that would cause the command to be erroneous or even damaging if
>it were run, and you're not addressing the other security issues that come
>with very long scheduling (overflowing buffers, or having lots of time to
>schedule a massive number of commands to all try to happen at once). I
>suspect there are other things that pressured adding the "near future"
>restriction that haven't been captured well yet.

Well, the thing is that 15 seconds (or 'a few seconds' for that matter) is a long enough time to send thousands (or more) of scheduled RPCs, so I am not sure the sched-max-future mitigates the buffer overflow threat. Generally speaking, Section 3.6 discusses erroneous scenarios, and not security threats.

I would suggest to add some text to the security considerations section, which discusses the overflow attack you mentioned here. Would this address your concern?


>I think you're saying that in production deployments today, the 
>authorization policy is "the peer was able to send me a packet". Is that 
>wrong?

I can't comment about what is deployed in production today, although I am sure there are operators out there who can comment about that. RFC 6536, which defines a NETCONF access control model, is cited by 6 other RFCs, so I do not think access control has been overlooked by the community. Nevertheless, I believe that (much like RFC 6241) the access control specifics are not within the scope of the current draft.


Thanks,
Tal.


>-----Original Message-----
>From: Robert Sparks [mailto:rjsparks@xxxxxxxxxxx]
>Sent: Tuesday, August 04, 2015 9:01 PM
>To: Tal Mizrahi
>Cc: ietf@xxxxxxxx; General Area Review Team; draft-mm-netconf-time-
>capability.all@xxxxxxxx
>Subject: Re: Gen-art LC review: draft-mm-netconf-time-capability-05
>
>
>
>On 8/4/15 11:19 AM, Tal Mizrahi wrote:
>> Hi Robert,
>>
>> Thanks for the comments.
>>
>>
>>>> A typical example of using near-future scheduling is a coordinated
>>>> commit; a client needs to trigger a commit at n servers, so that the
>>>> n servers perform the commit as close as possible to simultaneously.
>>>> Without the time capability, the client sends a sequence of n commit
>>>> messages, and thus each server performs the commit at a different
>>>> time. By using the time capability, the client can send commit
>>>> messages that are scheduled to take place at time Ts, which is 5
>>>> seconds in the future, causing the servers to invoke the commit as close
>as possible to time Ts.
>>> I'm interested in your response to Andy's point on this paragraph.
>> Okay, so here is Andy's point:
>>
>>>> You should pick a different example because the NETCONF
>>>> confirmed-commit procedure is designed to be loose-coupled.  The
>default timeout is 10 minutes.
>>>> Since the client needs sessions open with all servers involved in
>>>> the network-wide commit, there is no advantage in staging the
>>>> <commit> operations 15 sec. in advance, to make sure the servers are
>reachable.
>> And here is our response from 02-Aug-2015:
>>
>>> Right, confirmed-commit is loose-coupled. But the example quoted
>>> above (Example
>>> 1 in the draft) is not intended to replace the confirmed commit. The
>>> purpose in this example is different: the client wants the commit
>>> RPCs to be executed at the same time in all servers.
>>> The confirmed-commit serves a different purpose, which is to make
>>> sure that everyone either commits or rolls back. BTW, a confirmed
>>> commit can be sent with the scheduled-time element, allowing to enjoy
>the best of both worlds.
>>
>> Please let us know if you have further concerns about this point.
>>
>>
>>>> The default value of sched-max-future is defined to be 15 seconds.
>>>> This duration is long enough to allow the scheduled RPC to be sent
>>>> by the client, potentially to multiple servers, and in some cases to
>>>> send a cancellation message, as described in Section ‎3.2. On the
>>>> other hand, the 15 second duration yields a very low probability of a
>reboot or a permission change.
>>> I'm not finding the explanation terribly persuasive, but it's at
>>> least _some_ explanation - thanks for that.  I'll leave it to the ADs
>>> and other reviewers in the field to see if it's sufficient for an
>>> experimental protocol.
>> (*) Please see comment (**) below.
>>
>>>> Note that we did not define a maximal value for sched-max-future,
>>>> since one of the goals was to define a generic tool that can be used
>>>> for various different environments. The draft clearly states the
>>>> intention of using near-future-scheduling, but the requirements and
>>>> constraints of different environments may require the
>>>> sched-max-future to have a different value, potentially higher than
>>>> 30 seconds. Hence, we prefer not to define a maximal value. Indeed, in
>the draft 06 there is a more detailed discussion about the issues we are trying
>to prevent by using near-future scheduling (Section 3.6).
>>> Without a maximal value, I think you need more of a discussion
>>> guiding the choice of sched-max-future. Otherwise, you are just
>>> waiving your hands at not addressing the problems with far-future
>>> scheduling, and potentially well-meaning but uninformed people are
>>> going to go step in them anyway. There was a point to choosing the near-
>future limit.
>>> Enforce it or explain it with more vigor please.
>> (**) Your point is well taken. What we suggest, regarding this point and the
>previous point (*), is that we add more text explaining the factors that affect
>sched-max-future to Section 3.6 .
>>
>> Here is the new text we suggest. Please let us know if this addresses your
>comment:
>>
>>
>> The challenge in far future scheduling is that during the long period between
>the time at which the RPC is sent and the time at which it is scheduled to be
>executed the following erroneous events may occur:
>> - The server may restart.
>> - The client's authorization level may be changed.
>> - The client may restart and send a conflicting RPC.
>> - A different client may send a conflicting RPC.
>Well, those are just a subset of the things that could change in command's
>context that would cause the command to be erroneous or even damaging if
>it were run, and you're not addressing the other security issues that come
>with very long scheduling (overflowing buffers, or having lots of time to
>schedule a massive number of commands to all try to happen at once). I
>suspect there are other things that pressured adding the "near future"
>restriction that haven't been captured well yet.
>>
>> In these cases if the server performs the scheduled operation it may
>perform an action that is inconsistent with the current network policy, or
>inconsistent with the currently active clients.
>>
>> Near future scheduling guarantees that external events such as the
>examples above have a low probability of occurring during the sched-max-
>future period, and even when they do, the period of inconsistency is limited
>to sched-max-future, which is a short period of time.
>>
>> Hence, sched-max-future should be configured to a value that is high
>enough to allow the client to:
>> 1. Send the scheduled RPC, potentially to multiple servers.
>> 2. Receive notifications or rpc-error messages from the server(s), or wait for
>a timeout and decide that if no response has arrive then something is wrong.
>> 3. If necessary, send a cancellation message, potentially to multiple servers.
>>
>> On the other hand, sched-max-future should be configured to a value that is
>low enough to allow a low probability of the erroneous events above, typically
>on the order of a few seconds. Note that even if sched-max-future is
>configured to a low value, it is still possible (with a low probability) that an
>erroneous event will occur. However, this short potentially hazardous period
>is not significantly worse than in conventional (unscheduled) RPCs, as even a
>conventional RPC may in some cases be executed a few seconds after it was
>sent by the client.
>>
>> The default value of sched-max-future is defined to be 15 seconds. This
>duration is long enough to allow the scheduled RPC to be sent by the client,
>potentially to multiple servers, and in some cases to send a cancellation
>message, as described in Section ‎3.2. On the other hand, the 15 second
>duration yields a very low probability of a reboot or a permission change.
>I still think, especially while this as at experimental, you should scope this with
>an absolute max. But I'm just one reviewer. Work it out with your AD.
>
>>
>>
>>>> This YANG module defines the <cancel-schedule> RPC. This RPC may
>>>> be considered sensitive or vulnerable in some network environments.
>>>> Since the value of the <schedule-id> is known to all the clients that are
>>>> subscribed to notifications from the server, the <cancel-schedule> RPC
>>>> may be used maliciously to attack servers by canceling their pending
>RPCs.
>>>> This attack is addressed in two layers: (i) security at the transport layer,
>>>> limiting the attack only to clients that have successfully initiated a secure
>>>> session with the server, and (ii) the authorization level required to cancel
>>>> an RPC should be the same as the level required to schedule it.
>>> To help me along, point me to the specifics of what you use to set and
>>> verify such an authorization level?
>> Indeed, there is a need for an authorization scheme, which is able to set and
>verify the authorization level.
>> NETCONF (RFC 6241) does not explicitly define an authorization scheme, and
>it is probably not within the scope of the current draft to define such a
>scheme either.
>> Quoting RFC 6241:
>>
>>     This document does not specify an authorization scheme, as such a
>>     scheme will likely be tied to a meta-data model or a data model.
>>     Implementors SHOULD provide a comprehensive authorization scheme
>with
>>     NETCONF.
>>     ...
>>     Different environments may well allow different rights prior to and
>>     then after authentication.  Thus, an authorization model is not
>>     specified in this document.  When an operation is not properly
>>     authorized, a simple "access denied" is sufficient.
>I think you're saying that in production deployments today, the
>authorization policy is "the peer was able to send me a packet". Is that
>wrong?
>>
>>
>>
>> Please let us know if you have further comments or concerns about any of
>the issues above.
>>
>> Thanks,
>> Tal.





[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]