Hello Guan, > On 2017/11/18 8:11, Martin Wilck wrote: > > The log standard deviation can be calculated much more simply > > by realizing > > > > sum_n (x_i - avg(x))^2 == sum_n x_i^2 - n * avg(x)^2 > > > > I derive the equation: > sum_n {(x_i - avg(x))^2} = sum_n{x_i^2 -2*x_i*avg(x) + avg(x)^2} > = sum_n{x_i^2} - 2*avg(x)*sum_n{x_i} + > sum_n{avg(x)^2} > = sum_n{x_i^2} - 2*avg(x)*avg(x) + > n*avg(x)^2 > = sum_n{x_i^2} + (n-2)*avg(x)^2 No, that's wrong: avg(x) = (1/n) * sum_n(x_i) => sum_n(x_i) = n * avg(x) Thus the 2nd term in the line before the last in your derivation is not "- 2*avg(x)*avg(x)", but "- 2*n*avg(x)*avg(x)", and the end result becomes sum_n(x_i^2) - n*avg(x)^2. > > > Also, use timespecsub rather than the custom timeval_to_usec, > > and avoid taking log(0). > > > > Great. > > > > + pp_pl_log(3, "%s: latency avg=%.2e uncertainty=%.1f > > prio=%d\n", > > latency avg -> latency geometric avg ? Because in most cases, > avg means arithmetic avg , but in this case, it means geometric avg. Yes, I meant the geometric average. I don't think we should bother the user with these subtleties. Well, maybe it would feel better if we'd use "geometric mean" rather than "avg" in the log message, alhough that might again irritate some people who don't know the term ... I really don't care much. > > + pp->dev, exp(lg_avglatency * lg_base), > > + exp(standard_deviation * lg_base), rc); > > How can you get the uncertainty of Log-normal distribution > is the exp(standard_deviation * lg_base) ? The "width" of the normal distribution is measured in terms of the standard deviation, sigma. In your patch, you correctly accounted for the confidence levels of the 2*sigma environment (https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule). Here, we're assuming a log-normal distribution for the latency (it's a practical assumption, not a statistical assertion - in reality the latency probably rather follows an exponential or Poisson distribution but we don't need to go into that detail). That means we're assuming that log(latency) can be described by a normal distribution with a certain standard deviation sigma around the log of the geometric mean. Again, sigma is the "width" of that normal distribution. Thus with ~68% probability, the log of the the latency is in the 1-sigma interval around the average. Translating that back into "real" latency, with 68% likelyhood it will be in the interval [(1/F) * gm, F*gm], where gm is the geometric mean and F=exp(sigma). Therefore, F (which is exp(standard_deviation * lg_base)) can be used as an estimate of the "uncertainty factor" for the latency. Agreed? Regards Martin -- Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel