Re: [dhcwg] TSVDIR review of draft-ietf-dhc-dhcpv4-bulk-leasequery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Joe,

I've edited the the email trail to try to concentrate on
the remaining open issues.  I've also removed the two
issues that were included in the email I sent you an hour
or so ago, since I expect that they will be covered in
continuations of that thread.

To summarize, there are 3 issues remaining that are
not included below:

1. The NAGLE issue you raised, which has its own thread.

2. The two issues:

	a) Interleaving

	b) Possible security vulnerability if data
	is available that wasn't already in the data
	path.

3. The ? issues remaining, below.  I think that
the number could be 0, but probably not.

Regards -- Kim



On Feb 17, 2012, at 5:47 PM, Joe Touch wrote:

> Hi, Kim,
> 
> First, thank you for your detailed response to my quite lengthy review.
> 
> Some further clarifications and confirmations appear below.
> 
> Joe
> 
> On 2/17/2012 12:09 PM, Kim Kinnear wrote:
>> 
>> Joe,
>> 
>> Thank you for your review.
>> 
>> My responses are indented, below...
>> 
>> On Feb 13, 2012, at 5:00 PM, Joe Touch wrote:
>> 
>>> Hi, all,
>>> 
>> > I've reviewed this document as part of the transport area
>>> directorate's ongoing effort to review key IETF documents. These
>>> comments were written primarily for the transport area directors,
>>> but are copied to the document's authors for their information and
>>> to allow them to address any issues raised. The authors should
>>> consider this review together with any other last-call comments
>>> they receive. Please always CC tsv-dir@xxxxxxxx if you reply to or
>>> forward this review.
>>> 
>>> This request was received Feb. 2, 2012.
>>> 
>>> This document describes an extension to DHCPv4 for the bulk query
>>> andtransfer of current lease status over TCP.
>>> 
>>> The following transport issues were noted, and are significant:
> 
>>> CONNECTION MANAGEMENT- The document describes the use of TCP
>>> connections for bulk transfer, but needs to be more specific about which
>>> side (relay client or server) closes the connection, under what
>>> circumstances, and with what mechanism (e.g., graceful CLOSE vs. ABORT,
>>> as per RFC 793)
>> 
>> 	Here is what we have now, 7.8 is for clients, 8.5 is for
>> 	servers:
>> 
>>> 7.8.  Closing Connections  [Client Section]
>>> 
>>>    The requestor SHOULD close the connection after the
>>>    DHCPLEASEQUERYDONE message is received for the last outstanding query
>>>    that it has sent, if it has no more queries to send.
> 
> Under what conditions is it useful to not close at this time, and if so, when/why is it eventually closed?
> 
> In this case it's also useful to explain what "no more queries to send" means - are clients encouraged to keep the connection open in case there are more to send, or do you expect that the client knows when it opens the connection whether it has more to send - and if so, shouldn't it have already sent them?

	These are not uninteresting questions, but they are over
	the line from protocol specification and into
	implementation in my view.  I can certainly imagine that
	the requestor has multiple queries to send but has chosen
	to send them serially instead of in parallel (on the same
	connection or on different connections).  That would be
	way that the requestor might know that it had more
	queries to send.  It might have a queue of them.

	I will add this to the document:

	"If a requestor doesn't know if it has more queries to
	send or not, then it SHOULD close the connection."


> 
>>> 
>>> 8.5.  Closing Connections   [Server Section]
>>> 
>>>    The DHCPv4 server SHOULD start a timer for BULK_LQ_DATA_TIMEOUT
>>>    seconds for a particular connection after it sends a
>>>    DHCPLEASEQUERYDONE message over that connection and if there is no
>>>    current query outstanding for that connection.  It should clear this
> 
> Is this a SHOULD?

	Yes, I'll change it.
> 
> And should the timer be cleared or restarted?

	I was thinking of them as the same thing, but restarted
	is probably better language.  I'll change it.
> 
> Under what conditions is it useful to not close at this time, and if so, when/why is it closed?

	All of these SHOULDs derive from the fundamental SHOULD
	about starting a timer.  You don't *have* to have a timer
	to implement this protocol.  You SHOULD use a timer, but
	you don't *have* to.  I can imagine other ways to dealing
	with closing dead connections which don't involve timers.
	No, I really don't want to put them into the document.
	But the initial SHOULD says that if you don't use a
	timer, you can still say that you implement this
	capability.
> 
>>>    timer if a query arrives over that connection.  If the timer expires,
>>>    the DHCPv4 server should close the connection.
> 
> Is this a SHOULD, as well?

	Yes, thanks, I'll change it.
> 
> Again, under what conditions is it useful to not close at this time, and if so, when/why is it closed?

	Again, this is a SHOULD because you don't even have to
	have a timer.  But even if you have a timer, you don't
	have to close it when the timer goes off.  You might want
	to be "smarter" about something, or use some external
	information not represented here to make what you
	consider a better decision.  The SHOULD allows you to do
	that without causing you to wonder if you really support
	the document.

>>>    The server MUST close its end of the TCP connection if it encounters
>>>    an error sending data on the connection.  The server MUST close its
>>>    end of the TCP connection if it finds that it has to abort an in-
>>>    process request.  A server aborting an in-process request SHOULD
>>>    attempt to signal that to its requestors by using the QueryTerminated
>>>    status code in the status-code option in a DHCPLEASEQUERYDONE
>>>    message, including a message string indicating details of the reason
>>>    for the abort.   If the server detects that the requesting end of the
>>>    connection has been closed, the server MUST close its end of the
>>>    connection.
> 
> OK. I found it a bit confusing to have this in separate places, but I viewed this from the transport perspective. This is fine.
> 
>> 	All connections are closed with a graceful CLOSE, and not
>> 	with an ABORT.  I'll add that to the document.
>> 
>> 	I'm at a loss to know how to be more specific about which
>> 	side of the connection closes the connection.  There are
>> 	rules for the client in 7.8, and rules for the server in
>> 	8.5.
> 
> Somewhere you should state that the protocol exchange typically involves the client closing the connection, and that the server closes only under exceptions.

	Ok.  I'll do that.
> 
>>> sec 7.3 indicates some conditions for terminating connections; this
>>> section should indicate which side performs this, and by what
>>> method(CLOSE, ABORT)
>> 
>> 	Section 7.3 discusses in two places where the "requestor" may
>> 	drop the connection.  The term "requestor" is used to
>> 	indicate which "side" performs the actions.  All of
>> 	Section 7 is about "Requestor Behavior", and the specific
>> 	details of section 7.3 in all cases indicate that the
>> 	instructions for dealing with the connection concern the
>> 	"requestor" explicitly.
>> 
>> 	I'll enhance the explanation to include that CLOSE should
>> 	be used.
>> 
>> 	If there is a specific set of instructions that are
>> 	unclear, I don't see them, so it would help me if you
>> 	could give me more detailed guidance than "Section 7.3".
>> 	Thanks.
> 
> It's probably sufficient to replace "requestor" with "client" to be consistent.

	Requestor is the entity sending in the DHCPv4 Bulk Leasequery.
	Client is the entity that has a DHCPv4 address and whose
	information shows up in the data part of the DHCPv4 Bulk
	Leasequery requests.  Client would not be consistent.

> However, this section then supercedes the description in 7.8 and 8.5 above - i.e., you are handling exceptions on the client side as well.

	I will enhance section 7.8 to indicate that requestors
	have to handle exception conditions and their intended
	actions.
> 
> I.e., 7.8 needs to include the exception cases listed in 7.3, or they should just be moved there.

	Yes.
> 
>>>> this sec also allows connections to stay open after all pending
>>>> transactions are complete (MAY); the rationale for this should be
>>>> given, or the connection MUST be closed.
>> 
>> 	I will add some words which clarify the reasons why you
>> 	might want to leave the connection open after all pending
>> 	transactions are complete.
> 
> FWIW, again this contradicts 7.8 which says the client SHOULD close after the response is received if it has no more queries to send.

	The rationale is that you might know from information not
	part of this document that you had more queries to send
	soon, even though you don't have them now.
> 
> This says "MAY" leave open. That's not the complement to SHOULD close.

	We don't really care if you keep it open or not.  Really.
	If you think that you will be happier, keep it open.  If
	you think it is simpler to close it (as I do), then close
	it.  Different people (different authors even!) have different
	ideas about how they would structure the code for this.  That's
	ok.  The point about all of this open/closed/multiple-operations
	stuff it to leave it up to the implementor.  Period.  We are
	not trying to tell them exactly how to implement this.   We
	are trying to focus on the protocol aspects, and leave understood.


>>> This section should more carefully explain behavior when a
>>> connection is dropped and the reason - e.g., timeout, abort,
>>> close.
>> 	
>> 	The application employing Bulk Leasequery doesn't
>> 	care about *why* the connection went away.  It
>> 	doesn't have to know if it was abort, close, or
>> 	timeout.  I will say that in this section.
> 
> Hmm. I would think that this information would be passed up to the application.

	It might, but we don't care.  Do you think we should?
	If the connection is gone, it is gone.  I don't think
	I care which syscall caused it.
> 
>>> 
>>> There were some other issues noted in this document. These
>>> comments appear below, and although not focused on transport
>>> issues, they represent significant issues that IMO should be
>>> addressed as well.
>>> 
>>> NOTE - regarding some terminology issues, I did not survey current
>>> DHCP RFCs for consistency, but IMO these terms should be corrected
>>> even if they are then inconsistent with existing specs.
>>> Joe
>>> 
>>> -----------
>>> 
>>> Major non-transport issues:
>>> 
>>> 
> 
>>> 
>>> - "VPN" is used throughout to refer to "private" addresses (RFC 1918); a VPN is not just private addressing.
>> 
>> 	Hmm.  Actually, VPN in this document does not refer to RFC 1918
>> 	private addressing.  If this is important to you, could you
>> 	be more specific as to your concern?
> 
> OK. I don't understand how it's being used, but if it's consistent with the rest of DHCP, that's fine.
> 
> However, I got thrown off by your use of 10.x.x.x.
> 
> There are appropriate addresses to use as examples in RFCs - see RFC 5737, and please change this example accordingly.

	I will replace the reference to 10.x.x.x with a reference
	to RFC 1918 addresses.  Which is the point that I was trying
	to make, not about some example address.
> 
> 
>>> 
> 
>>> - start-time-of-state is expressed as an offset from base-time;
>>> this appears to be the only time indicated as an offset rather than
>>> as an absolute. That inconsistency invites implementation errors;
>>> IMO, this should be absolute too.
>> 
>> 	There are two kinds of time options here, input options
>> 	and output options.  The input options are all absolute
>> 	times.  The output options are all offset times from the
>> 	base time (which by design is the only absolute time in
>> 	the output packets).
> 
> That'd be very useful to explain this way. Is this typical for DHCP? It's quite confusing IMO.

	DHCP times are traditionally in duration from "now", to not have
	to deal with clock skew issues (other than ongoing drift/skew).

	We introduced the base-time to work with that, and we kept
	the query times in absolute time.  I'll add words to explain
	that.
> 
> >       ...There are several input options
>> 	defined in this document.  The start-time-of-state is the
>> 	only output option defined in this draft.  It is defined
>> 	as a offset to the base time in large part because the
>> 	*other* output options that are used in the DHCP packet
>> 	are *also* offsets to the base time.  These options are
>> 	not new, but are options that already exist in other
>> 	RFC's.  So, while it looks like the state-time-of-state
>> 	is the odd man out at first glance, it actually is an
>> 	offset time because the *other* options with which it
>> 	will appear are also offset times.
>> 
>>> 
>>> - option lengths: throughout, the doc refers to option lengths as
>>> being "an octet"; they are *indicated* in one octet. Some are constant
>>> (e.g., DCHP-STATE), some allow the contents of the octet to vary. Again,
>>> this is an *unsigned integer* octet.
>> 
>> 	Yes, the size of an option is stored in
>> 	one octet.  So, yes, the length is *indicated*
>> 	by the value in that one octet.
>> 
>> 	But the length may be 1 octet, it may be
>> 	5 octets, it may vary between 8 and 10 octets.
>> 
>> 	I don't understand your concern.  Are you	
>> 	saying that the length should not be
>> 	stated to be "1 octet" or "5 octets"?
> 
> The length should be "indicated in 1 octet". I'm just suggesting a change to the way it's described. When you say "the length is an octet", that is ambiguous and can mean that the length value is 1, not just that this value is stored in one octet.

	Ok, I'll fix that.
> 
>> 	I will fix the integer to be specifically
>> 	unsigned (as previously indicated).
>>> 
>>> - some of the information provided (in DHCP-STATE) goes beyond
>>> that of DHCP. It would be useful to motivate the need for this
>>> information in a bulk query, and why it is not similarly available
>>> for nodes using UDP (e.g., as an extension to DHCPv4, not just to
>>> the bulk transfer command). again, absence of state information
>>> should not indicate state. State should always be expressed
>>> explicitly. these states are further meaningless without a state
>>> diagram explaining them. if such a diagram is not possible (as
>>> noted at end of sec 6.2.7), then the states are irrelevant and the
>>> option should not be included.
>> 
>> 	The state transition diagram can certainly be drawn, but
>> 	as it is fully connected, such a picture conveys no
>> 	useful information.
> 
> It's sufficient to indicate this then.
> 
> >	... The transitions are not themselves
>> 	useful in any case, as the the Bulk Leasequery is a
>> 	snapshot of information at one instant of time.  There is
>> 	every reason to assume that as the state moves from one
>> 	value to another that an external entity would not see
>> 	all (or even many) of these transitions.  This is a
>> 	snapshot of the DHCP state, and could be nothing else in
>> 	the environment of Bulk Leasequery.
>> 
>> 	The reason why the state transition graph is fully
>> 	connected is due to the common practice in DHCPv4 of
>> 	having two servers running in a failover relationship.
>> 	There are then two state machines for the lease state,
>> 	and at times one "leads" and the other "follows", and at
>> 	other times the other "leads" and its partner "follows".
>> 	So, examined individually, the state machines for each
>> 	individual failover partner is fully connected because at
>> 	any time it may receive an update from its partner that
>> 	it decides to accept that will simply "set" the state to
>> 	the value supplied by the partner.
>> 
>> 	All of this is orthogonal to the Bulk Leasequery protocol
>> 	-- except that we want the user of the Bulk Leasequery
>> 	protocol to be able to make sense out of information
>> 	supplied by each of the failover partners.  Or any
>> 	partners running any increased availability solution.
>> 
>> 	So, that is the reason why the dhcp-state option exists
>> 	at all -- to allow the user of Bulk Leasequery to make
>> 	sense out of the results of sending a Bulk Leasequery to
>> 	two different (but cooperating) DHCPv4 servers.
>> 
>> 	The DHC WG felt that this information was worthwhile to
>> 	have in this document, and I agree (or we wouldn't have
>> 	put it in).
> 
> Using the failover case as a justification would be helpful.

	Failover isn't an IETF standard, and so we try to step
	carefully around it possible existence.  I don't think
	I could get this document through back through the DHC WG
	if I mentioned failover, so I don't think this is going
	to work.
>> 
> 
>>> - sec 7.4 states that the clock skew of zero is desired; this
>>> assumes E2E delays under 1sec. An explanation of why this is
>>> desired should be given, as well as the consequences of it not.
>> 
>> 	I have given specific instructions on how to handle non-zero
>> 	clock skews.  There are no "consequences".  You code it
>> 	so that it works with non-zero clock skew.  Period.
>> 
>> 	It is desired because it is less confusing to humans.
> 
> I would argue that this then irrelevant, and there should be no statement about a skew of zero being preferred, since it could drive implementations that way unnecessarily.

	I believe that the other reason we like zero skew is 
	that it tends to expose fewer bugs in people's code,
	so keeping the admonishment that it is a better 
	answer seems to add some value.
> 
>>> Other non-transport issues:
>>> 
>>> - The document includes definitions and references to irrelevant
>>> deployment and implementation issues, notably DSLAMs, concentrators, and
>>> access concentrators. These should not appear formally; they should be
>>> used only to usefully illustrate currently intended uses.
>> 
>> 	These appear in the text in order to give
>> 	the otherwise abstract design goals some concrete
>> 	examples.   We already removed a lot of the design
>> 	goals, are you suggesting that they all be removed?
> 
> Actually, yes, except as examples in the intro/motivation. Elsewhere they are IMO distracting.

	DSLAM, concentrator, and access concentrator appear only
	in Section 3, Design Goals -- which is precisely the intro/motivation
	for this document.  They don't appear (at least to my text search
	engine) elsewhere.  So I think we have done what you want.
> 
> 
>>> - sec 3 should refer to relays, and returning the entire set or
>>> individual bindings; there is no reason to explain the goals in terms of
>>> access concentrators.
>> 
>> 	I don't know about this.  I could certainly change it,
>> 	but is this really important to you?  There was quite a
>> 	bit of negotiation in the WG about these examples.  I
>> 	don't want to restart that discussion and have it all
>> 	over again.  Does this really matter that much to you?
> 
> The examples should focus on client/server/relay terminology. If other example terms are used, they should be parenthetical examples:
> 
> 	relay (e.g., at a concentrator)
> 
> The point is that this is relevant without the concentrator. A concentrator is just one current example of where a relay exists, not the only one.

	True, and I will say that.   There was too much discussion
	of these examples to remove them, but while they are
	important to the process of getting through the DHC WG
	review, they are not of significant importance technically.

	So, I will add a sentence discussing the fact that a 
	concentrator is just an example.
> 
>>> - sec 3.2 appears to provide contradictory advice - caching is required, but should be avoided? it would be useful to resolve this inconsistency.
>> 
>> 	Use of Bulk Leasequery is designed to help you avoid
>> 	caching.
> 
> OK - it'd be useful to be clear on that in sec 3.2.

	I will say that in Section 3.2.
> 
>>> - sec 3.3 refers to 'fast path', but this term doesn't make sense in this context because fast-path is a forwarding issue. it would be useful to explain what you mean, and pick a different term
>> 
>> 	Yes, it is a forwarding issue, and many devices
>> 	will not forward unless they know that they are forwarding
>> 	to a device that has received a valid DHCPv4 address.
>> 	The point (well, one point) of Bulk Leasequery is to
>> 	allow these forwarders to gather data so that they
>> 	can make the correct decision in the fast path.
> 
> Per-IP validation of that sort of permission isn't a fast-path decision. Fast-path involves forwarding and policy decisions (which are not typically /32 decisions).
> 
> It would be useful to say this information is useful so such routers can make the appropriate decision, but the path (fast or slow) is irrelevant IMO.

        Yes, fast-path is a forwarding issue, and I believe
	the intent of this section was to mention that the
	information returned by Leasequery is useful when
	deciding to forward a packet -- and that Bulk Leasequery
	gives you the information you need to decide to forward
	or not in advance of seeing the packet (when it is too
	late to deal with it in the fast path).

	I am not convinced that this should be changed.


	Regards -- Kim

> 
>> ------------
>> 
>> 	Regards -- Kim
>> 
>> 
> _______________________________________________
> dhcwg mailing list
> dhcwg@xxxxxxxx
> https://www.ietf.org/mailman/listinfo/dhcwg

_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf


[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]