On Tue, Jul 17, 2018 at 12:08:36PM +0800, Xunlei Pang wrote: > The trace data corresponds to the last sample period: > trace entry 1: > cat-20755 [022] d... 1370.106496: cputime_adjust: task > tick-based utime 362560000000 stime 2551000000, scheduler rtime 333060702626 > cat-20755 [022] d... 1370.106497: cputime_adjust: result: > old utime 330729718142 stime 2306983867, new utime 330733635372 stime > 2327067254 > > trace entry 2: > cat-20773 [005] d... 1371.109825: cputime_adjust: task > tick-based utime 362567000000 stime 3547000000, scheduler rtime 334063718912 > cat-20773 [005] d... 1371.109826: cputime_adjust: result: > old utime 330733635372 stime 2327067254, new utime 330827229702 stime > 3236489210 > > 1) expected behaviour > Let's compare the last two trace entries(all the data below is in ns): > task tick-based utime: 362560000000->362567000000 increased 7000000 > task tick-based stime: 2551000000 ->3547000000 increased 996000000 > scheduler rtime: 333060702626->334063718912 increased 1003016286 > > The application actually runs almost 100%sys at the moment, we can > use the task tick-based utime and stime increased to double check: > 996000000/(7000000+996000000) > 99%sys > > 2) the current cputime_adjust() inaccurate result > But for the current cputime_adjust(), we get the following adjusted > utime and stime increase in this sample period: > adjusted utime: 330733635372->330827229702 increased 93594330 > adjusted stime: 2327067254 ->3236489210 increased 909421956 > > so 909421956/(93594330+909421956)=91%sys as the shell script shows above. > > 3) root cause > The root cause of the issue is that the current cputime_adjust() always > passes the whole times to scale_stime() to split the whole utime and > stime. In this patch, we pass all the increased deltas in 1) within > user's sample period to scale_stime() instead and accumulate the > corresponding results to the previous saved adjusted utime and stime, > so guarantee the accurate usr and sys increase within the user sample > period. But why it this a problem? Since its sample based there's really nothing much you can guarantee. What if your test program were to run in userspace for 50% of the time but is so constructed to always be in kernel space when the tick happens? Then you would 'expect' it to be 50% user and 50% sys, but you're also not getting that. This stuff cannot be perfect, and the current code provides 'sensible' numbers over the long run for most programs. Why muck with it? -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |