Dear Matt,
On Apr 28, 2009, at 5:44 AM, Matt Mathis wrote:
I've reviewed draft-ietf-pce-monitoring-04.txt as part of the
transport area directorate's ongoing effort to review key IETF
documents. These comments were written primarily for the transport
area directors, but are copied to the document's authors for their
information and to allow them to address any issues raised. The
authors should consider this review together with any other last-
call comments they receive. Please always CC tsv-dir@xxxxxxxx if you
reply to or forward this review.
draft-ietf-pce-monitoring-04.txt describes procedures and extensions
to the Path Computation Element Protocol (PCEP) for monitoring the
state of the path computation chain for troubleshooting and
performance monitoring purposes.
It is designed specifically to carry information about PCE liveness,
processing time and congestion.
However this draft does not define any of these metrics.
As a transport person, I have several comments about the congestion
metric.
First it wasn't clear from the document if "congestion" was
referring to the PCE itself or the corresponding LSPs. For clarity
of discussion, I will assume LSP congestion. Even if that is not
correct, my comments are general and there are equivalent problems
for PCE case.
This is, in fact, the wrong assumption. The congestion metric refers
to the congestion of the PCE itself.
We will add a clarification of this point to the top of section 4.4 as
follows:
Note that "congestion" as indicated by this object refers to the
processing state of the PCE and its ability to handle new PCEP
requests.
Second, there is not a universal definition of congestion. The
relevant feature of congestion is that it perturbs transit flows, by
causing some sort of back-pressure. This back-pressure generally
comes in the form of raised RTT and/or increased loss probability,
which reduces the data rate for elastic flows. In the operational
Internet normal values for these parameters can span many orders of
magnitude. For example on research and education backbones, loss
probabilities as high as 1E-6 would be considered massively
congested. In other parts of the world loss probabilities as low as
1E-2 might be considered extremely good. There is not a standard
way to determine when the load is high enough to effect service or
when the users would perceive the network as "congested".
Your discussion certainly applies to traffic congestion, but is not
applicable in this case.
PCE congestion is much easier to quantify since the measurements are
restricted to a single server. Congestion state is reported by a PCE
as a simple state, and an expected duration.
Here is the new text added to the document:
"A PCE is congested when it has a backlog of PCEP requests such that
it cannot
immediately start to process a new request thus leading to waiting
times. The congestion
duration is quantified as being the (estimated) time until the PCE
expects to be able to
immediately process a new PCEP request."
Without a definition of what congested means the metric is useless
for such things as choosing alternative paths. One implementation's
uncongested state might be lower performance than another
implementation's congested state.
This should be clear from the definition above.
Even if you are thinking in terms of admission control (where the
back-pressure is to reject calls), your success probability might be
higher on a very congested heavily multiplexed path than another
path which has a single user is using most of the capacity, but not
quite filling the link.
No, we are not thinking in terms of admission control. PCEP requests
are queued, not rejected. Thus knowledge of congestion is very
important to a PCC so as to potentially select another PCE.
Although my examples are somewhat contrived, my point still stands:
without a definition of "congested" there is no value to sharing a
congestion indication. I can't imagine any global definition of
congestion that would work, and suspect that you need to add a
mechanism to define a local, organization/topology specific
definition of congestion.
The issue here is probably that the definition of congestion was so
"obvious" to the people working on this that the concerns you raise
did not occur to them. Hopefully, the addition of the definition
set out above will clarify this.
Third, the only parameter carried by the congestion object is
"expected congestion duration", as though the network can anticipate
when the congestion will subside. It can't. It may be that this
parameter would be better identified by something like "recommended
polling interval", e.g. "please don't ask again for x seconds."
The details of a PCE implementation is not in scope. A PCE is in no
position to give advice to a PCC on this, but it can judge the
existing queue size and the current arrival rate of new requests.
It should be clear that "expected congestion duration" is not a
guarantee. Congestion might clear sooner, or might persist longer.
It should be seen as an indication not a guarantee.
In a similar vein neither processing time nor liveness is
sufficiently well defined.
Section 4.3 seems to be perfectly clear on processing time.
RFC 4655 describes liveness.
Although this is perhaps a nit, the IANA directions are structured
in a way that forces somebody else to rewrite your text, possibly
introducing errors, and peventing full review in last call. E.g.
where you have "The MONITORING Object-Class is to be assigned by
IANA (recommended value=19)" It would be better to say "The
MONITORING Object-Class is XX [Value to be provided by IANA,
recommended value=1]" The point is to clearly distinguish between 3
classes of text:
- Stuff that IANA adjusts in a clearly specified way while the
document is at
the RFC editor.
- Instructions to the IANA that should be removed while at the RFC
editor,
generally about the above.
- Instruction to the IANA that should be preserved in the final RFC
(Registry
creation, etc), which might include some details in the previous two
categories.
It should be clear to everyone (especially the reviewers) how the
IANA text is expected to be appear in the final RFC, even when it
can't match the ID.
We have already had discussions with IANA on the content of this
section, and will reach agreement with them. Our main requirement
has been to show exactly the text that we want included in the
registry.
This draft has serious issues, described in the review, and needs
some rethinking.
Thanks for your comments.
JP.
Thanks,
--MM--
-------------------------------------------
Matt Mathis http://staff.psc.edu/mathis
Work:412.268.3319 Home/Cell:412.654.7529
-------------------------------------------
_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf
I've reviewed draft-ietf-pce-monitoring-04.txt as part of the
transport area directorate's ongoing effort to review key IETF
documents. These comments were written primarily for the transport
area directors, but are copied to the document's authors for their
information and to allow them to address any issues raised. The
authors should consider this review together with any other last-
call comments they receive. Please always CC tsv-dir@xxxxxxxx if you
reply to or forward this review.
draft-ietf-pce-monitoring-04.txt describes procedures and extensions
to the Path Computation Element Protocol (PCEP) for monitoring the
state of the path computation chain for troubleshooting and
performance monitoring purposes.
It is designed specifically to carry information about PCE liveness,
processing time and congestion.
However this draft does not define any of these metrics.
As a transport person, I have several comments about the congestion
metric.
First it wasn't clear from the document if "congestion" was
referring to the PCE itself or the corresponding LSPs. For clarity
of discussion, I will assume LSP congestion. Even if that is not
correct, my comments are general and there are equivalent problems
for PCE case.
Second, there is not a universal definition of congestion. The
relevant feature of congestion is that it perturbs transit flows, by
causing some sort of back-pressure. This back-pressure generally
comes in the form of raised RTT and/or increased loss probability,
which reduces the data rate for elastic flows. In the operational
Internet normal values for these parameters can span many orders of
magnitude. For example on research and education backbones, loss
probabilities as high as 1E-6 would be considered massively
congested. In other parts of the world loss probabilities as low as
1E-2 might be considered extremely good. There is not a standard
way to determine when the load is high enough to effect service or
when the users would perceive the network as "congested".
Without a definition of what congested means the metric is useless
for such things as choosing alternative paths. One implementation's
uncongested state might be lower performance than another
implementation's congested state.
Even if you are thinking in terms of admission control (where the
back-pressure is to reject calls), your success probability might be
higher on a very congested heavily multiplexed path than another
path which has a single user is using most of the capacity, but not
quite filling the link.
Although my examples are somewhat contrived, my point still stands:
without a definition of "congested" there is no value to sharing a
congestion indication. I can't imagine any global definition of
congestion that would work, and suspect that you need to add a
mechanism to define a local, organization/topology specific
definition of congestion.
Third, the only parameter carried by the congestion object is
"expected congestion duration", as though the network can anticipate
when the congestion will subside. It can't. It may be that this
parameter would be better identified by something like "recommended
polling interval", e.g. "please don't ask again for x seconds."
In a similar vein neither processing time nor liveness is
sufficiently well defined.
Although this is perhaps a nit, the IANA directions are structured
in a way that forces somebody else to rewrite your text, possibly
introducing errors, and peventing full review in last call. E.g.
where you have "The MONITORING Object-Class is to be assigned by
IANA (recommended value=19)" It would be better to say "The
MONITORING Object-Class is XX [Value to be provided by IANA,
recommended value=1]" The point is to clearly distinguish between 3
classes of text:
- Stuff that IANA adjusts in a clearly specified way while the
document is at
the RFC editor.
- Instructions to the IANA that should be removed while at the RFC
editor,
generally about the above.
- Instruction to the IANA that should be preserved in the final RFC
(Registry
creation, etc), which might include some details in the previous two
categories.
It should be clear to everyone (especially the reviewers) how the
IANA text is expected to be appear in the final RFC, even when it
can't match the ID.
This draft has serious issues, described in the review, and needs
some rethinking.
Thanks,
--MM--
-------------------------------------------
Matt Mathis http://staff.psc.edu/mathis
Work:412.268.3319 Home/Cell:412.654.7529
-------------------------------------------
_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf
_______________________________________________
Ietf mailing list
Ietf@xxxxxxxx
https://www.ietf.org/mailman/listinfo/ietf