[PMOL] RE: draft-ietf-pmol-sip-perf-metrics-02

"Daryl Malas" <D.Malas@xxxxxxxxxxxxx> · Tue, 10 Feb 2009 11:22:53 -0700

Gerald,

Thank you for the review and detailed feedback.  I made changes in the
draft per all of your suggestions; however, I modified some of them.
See commments in-line...

Regards,

Daryl

----------------
Daryl Malas
CableLabs
(o) +1 303 661 3302
(f) +1 303 661 9199
mailto:d.malas@xxxxxxxxxxxxx

> -----Original Message-----
> From: Gerald Q. Maguire Jr. [mailto:maguire@xxxxxx] 
> Sent: Tuesday, November 11, 2008 9:02 AM
> To: Daryl Malas
> Subject: draft-ietf-pmol-sip-perf-metrics-02
> 
> http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-02
> 
> When measurement results will be correlated with other results or
>    information using time-of-day stamps, then the time clock that
>    supplies T1 SHOULD be synchronized to a primary time source, to
>    minimize the time offset.
> should be:
> 
> When measurement results will be correlated with other results or
>    information using time-of-day stamps, then the time clock that
>    supplies T1 SHOULD be synchronized to a primary time source, to
>    minimize the error in the time offset.
> 
> This change is important as otherwise you have used "time 
> offset" in a different meaning that you defined it in the 
> prior paragraph.

After considering your suggestion, I modified the paragraph to read:

When measurement results will be correlated with other results or
    information using time-of-day stamps, then the time clock that
    supplies T1 SHOULD be synchronized to a primary time source, to
    minimize the error in the time offset. The time offset MUST be
    reported with each measurement.

> 
> --
>  
> The accuracy of the T4-T1 interval is also critical to maintain and
>    report. The relevant definition from [12 
> <http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-0
> 2#ref-12>] is "skew": the difference
>    between time offsets at T1 and T4 is the error for the measurement
>    interval associated with the clock's skew.
> should be:
> 
> The accuracy of the T4-T1 interval is also critical to maintain and
>    report. The difference in errors 
>    between the the time offsets at T1 and T4 is associated 
> with the clock's skew[12 
> <http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-0
> 2#ref-12>].

I used the above paragraph suggestion.

> 
> Skew is according to the definition you cite in [12] related 
> to the difference in clock frequency between a true clock and 
> the local clock.
> ("A clock's "skew" at a particular moment is the frequency difference
>    (first derivative of its offset with respect to true time) between
>    the clock and true time.[12])
> 
> More properly the statement might be:
> The accuracy of the measurement of the T4-T1 interval is critical and
>    should be reported. The difference in errors 
>    between the the time offsets at T1 and T4 is associated 
> with the clock's skew[12 
> <http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-0
> 2#ref-12>] and the clock's "drift".
> 
> Even more formally this is because the (local) clock's 
> "offset" at T1 and the (local) clock's "offset" at T4 are not 
> the same, due to the (local) clock's skew and drift (the 1st 
> and 2nd derivatives of the clock's frequency).
> 
> Here I have used "clock's offset" as defined in [12].
> 
> ---
> 
> The clock error SHOULD
>    be constrained to less than +/- 1 ms, implying 1 part per 1000
>    frequency accuracy for a 1 second interval.
> should be:
> The clock error SHOULD
>    be constrained to less than +/- 1 ms. This implies a 
> frequency stability of greater 1 part per 1000
>    for a 1 second interval. This implies greater stability is 
> required as the length of the T4-T1 increases,
>    in order to constrain the error to be less than +/- 1 ms.
> 
> ---
> The following statement, seems to imply that reading the time 
> from a clock requires interrupt processing and this need not 
> be the case.
> 
> The physical operation of reading time from a clock may be
>    constrained by the delay to service the interrupt. Therefore, the
>    accuracy of the time stamp read at T1 or T4 always includes the
>    interrupt delay, and this source of error SHOULD be known and
>    included in the error assessment.
>  
> It would be better to say:
> The physical operation of reading time from a clock may be
>    constrained by the delay to service the interrupt. 
> Therefore, if the
>    accuracy of the time stamp read at T1 or T4 includes
>    interrupt delay, then this source of error SHOULD be known and
>    included in the error assessment.
> 
> ---
> 
> There is also some confusion when you introduce the statement:
>  
> 2. If a free-running clock is used to make the time interval
>       measurement, then value of T1 reported SHOULD be derived from a
>       different clock that meets the time of day accuracy requirements
> 
> Since if you are measuring the T1 to T4 interval using such a 
> clock, then there need not be a time of day measurement for 
> T4, but rather an estimate of T4 based upon the measured 
> intervals (measured using the free running clock) described 
> above. Thus it is quite common today to measure a short time 
> interval using the CPU's internal counter driven by the CPU 
> clock (often as a RTC), this time interval can often be much 
> higher resolution and have a much higher stability that the 
> time of day clock.
> Additionally this clock generally is accessed by a read 
> register operation and not an interrupt. (Note that you still 
> have to state the relationship between this clock and the 
> time of day clock in order to specify when the measurement 
> occurred - the time of day of T1.)
> 

I modified the paragraph to read:

 If a free-running clock is used to make the time interval measurement,
then the time of day reported with the measurement (which is normally
timestamp T1) SHOULD be derived from a different clock that meets the
time of day accuracy requirements...

> ---
> 
> I do not think that the following statement is correct - from 
> a statistical data analysis point of view:
>  
> In regards to all of the metrics, the output values are directly
>    related to the accuracy and the equivalent level of granularity of
>    the input values.
> 
> The word "directly" is not strictly true. Perhaps the 
> following might be better:
> 
> In regards to all of the metrics, the accuracy and 
> granularity of the output values are
>    related to the accuracy and granularity of
>    the input values.
> 
> ---
> Registration Request Delay is utilized to detect failures or
>    impairments causing delays in responding to a UAC REGISTER request.
>    RRD SHALL be measured and reported only for successful REGISTER
>    requests, and Ineffective Registration Attempts (Section 
> 4.2 
> <http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-0
> 2#section-4.2>) SHALL
>    be reported for failures.  This metric is measured at the UAC.  The
>    output value of this metric is numerical and SHOULD be adjusted to
>    indicate milliseconds.  The following represents the 
> calculation for
>    this metric:
> should be:
> 
> Registration Request Delay is a measurement of the delay in 
> responding to a UAC REGISTER request.
>    RRD SHALL be measured and reported only for successful REGISTER
>    requests, while Ineffective Registration Attempts (Section 
> 4.2 
> <http://tools.ietf.org/html/draft-ietf-pmol-sip-perf-metrics-0
> 2#section-4.2>) SHALL
>    be reported for failures.  This metric is measured at the UAC.  The
>    output value of this metric is numerical and SHOULD be 
> stated in units of milliseconds.
>    The following represents the calculation for this metric:
> 
> The changes are necessary since:
> 1. RRD does not provide any information in the case of failures!
> 2. The measured value is stated in milliseconds - since you 
> have previously
>    stated that the clock error should be less than +/- 1 
> millisecond. Of course from
>    a statistical point of view - the measured value can only 
> be considered to be in units
>    of 2 milliseconds - since if you want to have 1 
> millisecond accuracy, then the clock
>    has to be accurate to better than +/- 0.5 milliseconds 
> (since the measurement is an
>    interval between to clock values).
> 
> Note that throughout the text you should be stating that it 
> is not an adjustment to milliseconds, but rather that this is 
> simply the units used for this measurement.
> 
> ---
> In a successful registration attempt, RRD is defined as the time
>    interval from the moment the initial REGISTER message 
> containing the
>    necessary information is passed by the originating UAC to the
>    intended registrar until the 200 OK is received indicating the
>    registration attempt has completed successfully.  This dialog
>    includes an expected authentication challenge prior to 
> receiving the
>    200 OK as describe in the following registration flow examples.
> 
> In a successful registration attempt, RRD is defined as the time
>    interval from the first bit of the initial REGISTER message being
>    transmitted by the originating UAC to the
>    intended registrar until the 200 OK is last bit of the 
> response indicating the
>    registration attempt has completed successfully has been 
> received.  This dialog
>    includes any expected authentication challenge prior to 
> receiving the
>    200 OK as describe in the following registration flow examples.
> 
> I think that the above changes are necessary because of the 
> way you defined T1 and T4; and the fact that an challenge 
> might not occur (or even need to occur) - thus the interval 
> only includes the challenge and response if they are expected.

Taking into consideration your suggestion, I have modified the paragraph
to read:

In a successful registration attempt, RRD is defined as the time 
   interval from the first bit of the initial REGISTER message
containing the 
   necessary information is passed by the originating UAC to the 
   intended registrar until the last bit of the 200 OK is received
indicating the 
   registration attempt has completed successfully.  This dialog 
   includes an expected authentication challenge prior to receiving the 
   200 OK as describe in the following registration flow examples.

I also updated all other metrics with a result of time to align with
this.

> ---
> 
> 
>  Ineffective registration attempts are utilized to detect failures or
>    impairments causing an inability for a registrar to receive or
>    respond to a UAC REGISTER request.  This metric is measured at the
>    UAC.  The output value of this metric is numerical and SHOULD be
>    adjusted to indicate a percentage of registration attempts.
> should be:
>  Ineffective registration attempts are utilized to monitor 
> registration failures,
>    i.e. the inability for a registrar to receive or
>    respond to a UAC REGISTER request.  This metric is measured at the
>    UAC.  The output value of this metric is numerical and SHOULD be
>    reported as a percentage of registration attempts.
> ---
> You say:
> IRA may be
>    used to detect problems in downstream signaling functions, 
> which may
>    be impairing the REGISTER message from reaching the intended
>    registrar; or, it may indicate a registrar has become 
> overloaded and
>    is unable to respond to the request.
> 
> However, I would think of the first problem being upstream 
> signaling - affecting the REGISTER message from reaching the 
> registration, while downstream signaling problems would be 
> reflected in the register's response not being able to reach the UAC.
> 

This was not worded well in the last revision and was noted to create
confusion during the last working group session, so I have modified the
paragraph to read:

Ineffective registration attempts are utilized to detect failures or 
   impairments causing an inability for a registrar to receive a UAC
REGISTER request.
   This metric is measured at the UAC.  The output value of this metric
is numerical
   and SHOULD be reported as a percentage of registration attempts.

> ---
> 
> There should also be a statement with regard to the first 
> figure on page 8 of if the Total number of REGISTER Requests 
> increases by 3 or if these 3 attempts at transmission are 
> part of a single registration attempt.

I agree this is confusing.  I have added the following paragraph under
the signaling flow example:

In the previous message flow the UAC retries a REGISTER request multiple
times
     before the timer, indicating the failure, expires.  Only the first
REGISTER request MUST
     used for input to the calculation and an IRA.  Subsequent REGISTER
retries are identified
     by the same Call-ID and MUST be ignored for purposes of metric
calculation.  This ensures
     an accurate representation of the metric output. 

> 
> ---
> 
> 
>       Session Request Delay (SRD)is not a metric, but rather a set of
>       metrics. This is the case because you do not combine successful
>       and failed responses in the same result. Thus any 
> result has to be
>       reported also stateing which type of SRD it is.

The following sentence has been added to the SRD introduction paragraph:

The output value of this metric 
   MUST indicate whether the output is for successful or failed session
requests and
   SHOULD be stated in units of seconds.

> 
> ---
> 
> The output value of this metric is
>    numerical and SHOULD be adjusted to indicate seconds.
> should be:
> 
> The output value of this metric is
>    numerical and SHOULD be reported in units of seconds. 
> 
> ---
> 
> Regards,
> G. Q. "Chip" Maguire Jr.
> 
> 
> 
_______________________________________________
Sipping mailing list  https://www.ietf.org/mailman/listinfo/sipping
This list is for NEW development of the application of SIP
Use sip-implementors@xxxxxxxxxxxxxxx for questions on current sip
Use sip@xxxxxxxx for new developments of core SIP