Hi Martin, Thanks for your clarification. I agree with you now in this thread. Regards, Guan On 2017/11/20 10:10, Martin wrote: > > Hello Guan, > >> On 2017/11/18 8:11, Martin Wilck wrote: >>> The log standard deviation can be calculated much more simply >>> by realizing >>> >>> sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2 >>> >> >> I derive the equation: >> sum_n {(x_i - avg(x))^2} = sum_n{x_i^2 -2*x_i*avg(x) + avg(x)^2} >> = sum_n{x_i^2} - 2*avg(x)*sum_n{x_i} + >> sum_n{avg(x)^2} >> = sum_n{x_i^2} - 2*avg(x)*avg(x) + >> n*avg(x)^2 >> = sum_n{x_i^2} + (n-2)*avg(x)^2 > > No, that's wrong: > > avg(x) = (1/n) * sum_n(x_i) > => sum_n(x_i) = n * avg(x) > > Thus the 2nd term in the line before the last in your derivation > is not "- 2*avg(x)*avg(x)", but "- 2*n*avg(x)*avg(x)", and the end > result becomes sum_n(x_i^2) - n*avg(x)^2. > >> >>> Also, use timespecsub rather than the custom timeval_to_usec, >>> and avoid taking log(0). >>> >> >> Great. >> >> >>> + pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f >>> prio=%d\n", >> >> latency avg -> latency geometric avg ? Because in most cases, >> avg means arithmetic avg , but in this case, it means geometric avg. > > Yes, I meant the geometric average. I don't think we should bother the > user with these subtleties. Well, maybe it would feel better if we'd > use "geometric mean" rather than "avg" in the log message, alhough that > might again irritate some people who don't know the term ... I really > don't care much. > >>> + pp->dev, exp(lg_avglatency * lg_base), >>> + exp(standard_deviation * lg_base), rc); >> >> How can you get the uncertainty of Log-normal distribution >> is the exp(standard_deviation * lg_base) ? > > The "width" of the normal distribution is measured in terms of the > standard deviation, sigma. In your patch, you correctly accounted for > the confidence levels of the 2*sigma environment > (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule). > > Here, we're assuming a log-normal distribution for the latency (it's a > practical assumption, not a statistical assertion - in reality the > latency probably rather follows an exponential or Poisson distribution > but we don't need to go into that detail). That means we're assuming > that log(latency) can be described by a normal distribution with a > certain standard deviation sigma around the log of the geometric mean. > Again, sigma is the "width" of that normal distribution. Thus with ~68% > probability, the log of the the latency is in the 1-sigma interval > around the average. Translating that back into "real" latency, with 68% > likelyhood it will be in the interval [(1/F) * gm, F*gm], where gm is > the geometric mean and F=exp(sigma). Therefore, F (which is > exp(standard_deviation * lg_base)) can be used as an estimate of the > "uncertainty factor" for the latency. > > Agreed? > > Regards > Martin > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel