On Sat, Apr 13, 2013 at 11:44:54AM -0700, Linus Torvalds wrote: > PS. This is the "Make sure 'total' fits in 32 bits first" version. Not > really tested, but it's just changing the order of operations a bit. > > /* We know one of the values has a bit set in the high 32 bits */ > for (;;) { > /* Make sure "rtime" is the bigger of stime/rtime */ > if (stime > rtime) { > u64 tmp = rtime; rtime = stime; stime = tmp; > } > > /* Make sure 'total' fits in 32 bits */ > if (total >> 32) > goto drop_precision; > > /* Does rtime (and thus stime) fit in 32 bits? */ > if (!(rtime >> 32)) > break; > > /* Can we just balance rtime/stime rather than dropping bits? */ > if (stime >> 31) > goto drop_precision; > > /* We can grow stime and shrink rtime and try to make them both fit */ > stime <<= 1; > rtime >>= 1; > continue; > > drop_precision: > /* We drop from rtime, it has more bits than stime */ > rtime >>= 1; > total >>= 1; > } It also also pass 0.1% relative error on my tests. Decreasing error threshold to 0.02% failed. I didn't check other values or measure how frequent 0.02% fail on each version, I assume this one is better :-) So regarding relative error this algorithm is fine, there is no multiplication overflow error which make scaled numbers bogus. Then I looked on this algorithm regarding context how it is used ... Raw stime/rtime/total values will increase in jiffies resolution, so do scaled_stime if we do not drop precision. For bigger numbers, since we drop precision, scaled_stime will grow in chunks. How big the chunk depend on how overall big numbers are and stime/total ratio. For example: stime = 0.5*total, 128 threads, after 1 year of CPU execution chunk will be 1024 jiffies. We use scaled stime value this way: if (total) stime = scale_stime(stime, rtime, total); else stime = rtime; /* * If the tick based count grows faster than the scheduler one, * the result of the scaling may go backward. * Let's enforce monotonicity. */ prev->stime = max(prev->stime, stime); prev->utime = max(prev->utime, rtime - prev->stime); *ut = prev->utime; *st = prev->stime; Since rtime increase, but scaled stime not, stime will be accounted as prev->utime. Then after chunk jiffies, stime will grow and we will get it accounted in prev->stime. As result we will account more cpu time than actually process do. This error will accumulate depending how frequently cputime_adjust(process) will be called. As solution for this we could just stop accounting if prev->stime + prev->utime are already bigger than rtime. For example: rtime = nsecs_to_cputime(curr->sum_exec_runtime); + /* + * Update userspace visible utime/stime values only if actual execution + * time is bigger than already exported. Note that can happen, that we + * provided bigger values due to scaling inaccuracy on big numbers. + */ + if (prev->stime + prev->utime >= rtime) + goto out; + if (total) stime = scale_stime(stime, rtime, total); else @@ -573,6 +581,7 @@ static void cputime_adjust(struct task_cputime *curr, prev->stime = max(prev->stime, stime); prev->utime = max(prev->utime, rtime - prev->stime); +out: *ut = prev->utime; *st = prev->stime; } This should help with erroneously accounting more CPU time than process actually use. As disadvantage userspace will see CPU time increase in chunks, but I think this is better than see values much bigger than correct ones (and for 99.9% user cases it does not matter). Stanislaw -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html