Re: [PATCH 3/4] libmultipath: path latency: simplify getprio()

Martin Wilck <mwilck@xxxxxxxx> · Mon, 20 Nov 2017 09:46:46 +0100

Hello Guan,

> On 2017/11/18 8:11, Martin Wilck wrote:
> > The log standard deviation can be calculated much more simply
> > by realizing
> > 
> >    sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2
> > 
> 
> I derive the equation:
>  sum_n {(x_i - avg(x))^2} = sum_n{x_i^2 -2*x_i*avg(x) + avg(x)^2}
>                           = sum_n{x_i^2} - 2*avg(x)*sum_n{x_i} +
> sum_n{avg(x)^2}
>                           = sum_n{x_i^2} - 2*avg(x)*avg(x) +
> n*avg(x)^2
>                           =  sum_n{x_i^2} + (n-2)*avg(x)^2

No, that's wrong:

    avg(x) = (1/n) * sum_n(x_i)
=>  sum_n(x_i) = n * avg(x)

Thus the 2nd term in the line before the last in your derivation
is not "- 2*avg(x)*avg(x)", but "- 2*n*avg(x)*avg(x)", and the end
result becomes sum_n(x_i^2) - n*avg(x)^2.

> 
> > Also, use timespecsub rather than the custom timeval_to_usec,
> > and avoid taking log(0).
> > 
> 
> Great.
> 
> 
> > +	pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f
> > prio=%d\n",
> 
> latency avg -> latency geometric avg ? Because in most cases,
> avg means arithmetic avg , but in this case, it means geometric avg.

Yes, I meant the geometric average. I don't think we should bother the
user with these subtleties. Well, maybe it would feel better if we'd
use "geometric mean" rather than "avg" in the log message, alhough that
might again irritate some people who don't know the term ... I really
don't care much.

> > +		  pp->dev, exp(lg_avglatency * lg_base),
> > +		  exp(standard_deviation * lg_base), rc);
> 
> How can you get the uncertainty of Log-normal distribution
> is the exp(standard_deviation * lg_base) ?

The "width" of the normal distribution is measured in terms of the
standard deviation, sigma. In your patch, you correctly accounted for
the confidence levels of the 2*sigma environment 
(https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule).

Here, we're assuming a log-normal distribution for the latency (it's a
practical assumption, not a statistical assertion - in reality the
latency probably rather follows an exponential or Poisson distribution
but we don't need to go into that detail). That means we're assuming
that log(latency) can be described by a normal distribution with a
certain standard deviation sigma around the log of the geometric mean.
Again, sigma is the "width" of that normal distribution. Thus with ~68%
probability, the log of the the latency is in the 1-sigma interval
around the average. Translating that back into "real" latency, with 68%
likelyhood it will be in the interval [(1/F) * gm, F*gm], where gm is
the geometric mean and F=exp(sigma). Therefore, F (which is
exp(standard_deviation * lg_base)) can be used as an estimate of the
"uncertainty factor" for the latency.

Agreed?

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel