Re: Review of draft-ietf-pce-monitoring-04.txt

JP Vasseur <jvasseur@xxxxxxxxx> · Tue, 15 Dec 2009 16:50:02 +0100

Dear Matt,

On Apr 28, 2009, at 5:44 AM, Matt Mathis wrote:

I've reviewed draft-ietf-pce-monitoring-04.txt as part of the  
transport area directorate's ongoing effort to review key IETF  
documents. These comments were written primarily for the transport  
area directors, but are copied to the document's authors for their  
information and to allow them to address any issues raised. The  
authors should consider this review together with any other last- 
call comments they receive. Please always CC tsv-dir@xxxxxxxx if you  
reply to or forward this review.

draft-ietf-pce-monitoring-04.txt describes procedures and extensions  
to the Path Computation Element Protocol (PCEP) for monitoring the  
state of the path computation chain for troubleshooting and  
performance monitoring purposes.

It is designed specifically to carry information about PCE liveness,  
processing time and congestion.

However this draft does not define any of these metrics.

As a transport person, I have several comments about the congestion  
metric.

First it wasn't clear from the document if "congestion" was  
referring to the PCE itself or the corresponding LSPs.  For clarity  
of discussion, I will assume LSP congestion.  Even if that is not  
correct, my comments are general and there are equivalent problems  
for PCE case.

This is, in fact, the wrong assumption. The congestion metric refers
to the congestion of the PCE itself.

We will add a clarification of this point to the top of section 4.4 as  
follows:

Note that "congestion" as indicated by this object refers to the
processing state of the PCE and its ability to handle new PCEP
requests.

Second, there is not a universal definition of congestion.  The  
relevant feature of congestion is that it perturbs transit flows, by  
causing some sort of back-pressure.  This back-pressure generally  
comes in the form of raised RTT and/or increased loss probability,  
which reduces the data rate for elastic flows.  In the operational  
Internet normal values for these parameters can span many orders of  
magnitude.  For example on research and education backbones, loss  
probabilities as high as 1E-6 would be considered massively  
congested.  In other parts of the world loss probabilities as low as  
1E-2 might be considered extremely good.  There is not a standard  
way to determine when the load is high enough to effect service or  
when the users would perceive the network as "congested".

Your discussion certainly applies to traffic congestion, but is not
applicable in this case.

PCE congestion is much easier to quantify since the measurements are
restricted to a single server. Congestion state is reported by a PCE
as a simple state, and an expected duration.

Here is the new text added to the document:

"A PCE is congested when it has a backlog of PCEP requests such that  
it cannot
immediately start to process a new request thus leading to waiting  
times. The congestion
duration is quantified as being the (estimated) time until the PCE  
expects to be able to
immediately process a new PCEP request."

Without a definition of what congested means the metric is useless  
for such things as choosing alternative paths.  One implementation's  
uncongested state might be lower performance than another  
implementation's congested state.

This should be clear from the definition above.

Even if you are thinking in terms of admission control (where the  
back-pressure is to reject calls), your success probability might be  
higher on a very congested heavily multiplexed path than another  
path which has a single user is using most of the capacity, but not  
quite filling the link.

No, we are not thinking in terms of admission control. PCEP requests
are queued, not rejected. Thus knowledge of congestion is very
important to a PCC so as to potentially select another PCE.

Although my examples are somewhat contrived, my point still stands:  
without a definition of "congested" there is no value to sharing a  
congestion indication.   I can't imagine any global definition of  
congestion that would work, and suspect that you need to add a  
mechanism to define a local, organization/topology specific  
definition of congestion.

The issue here is probably that the definition of congestion was so
"obvious" to the people working on this that the concerns you raise
did not occur to them. Hopefully, the addition of the definition
set out above will clarify this.

Third, the only parameter carried by the congestion object is  
"expected congestion duration", as though the network can anticipate  
when the congestion will subside. It can't.  It may be that this  
parameter would be better identified by something like "recommended  
polling interval", e.g. "please don't ask again for x seconds."

The details of a PCE implementation is not in scope. A PCE is in no
position to give advice to a PCC on this, but it can judge the
existing queue size and the current arrival rate of new requests.

It should be clear that "expected congestion duration" is not a
guarantee. Congestion might clear sooner, or might persist longer.
It should be seen as an indication not a guarantee.

In a similar vein neither processing time nor liveness is  
sufficiently well defined.

Section 4.3 seems to be perfectly clear on processing time.
RFC 4655 describes liveness.

Although this is perhaps a nit, the IANA directions are structured  
in a way that forces somebody else to rewrite your text, possibly  
introducing errors, and peventing full review in last call. E.g.  
where you have "The MONITORING Object-Class is to be assigned by  
IANA (recommended value=19)" It would be better to say "The  
MONITORING Object-Class is XX [Value to be provided by IANA,  
recommended value=1]"  The point is to clearly distinguish between 3  
classes of text:

- Stuff that IANA adjusts in a clearly specified way while the  
document is at
 the RFC editor.

- Instructions to the IANA that should be removed while at the RFC  
editor,
 generally about the above.

- Instruction to the IANA that should be preserved in the final RFC  
(Registry
 creation, etc), which might include some details in the previous two
 categories.

It should be clear to everyone (especially the reviewers) how the  
IANA text is expected to be appear in the final RFC, even when it  
can't match the ID.

We have already had discussions with IANA on the content of this
section, and will reach agreement with them. Our main requirement
has been to show exactly the text that we want included in the
registry.

This draft has serious issues, described in the review, and needs  
some rethinking.

Thanks for your comments.

JP.

Thanks,
--MM--
-------------------------------------------
Matt Mathis     http://staff.psc.edu/mathis
Work:412.268.3319    Home/Cell:412.654.7529
-------------------------------------------
_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf
I've reviewed draft-ietf-pce-monitoring-04.txt as part of the  
transport area directorate's ongoing effort to review key IETF  
documents. These comments were written primarily for the transport  
area directors, but are copied to the document's authors for their  
information and to allow them to address any issues raised. The  
authors should consider this review together with any other last- 
call comments they receive. Please always CC tsv-dir@xxxxxxxx if you  
reply to or forward this review.