On Fri, 2017-02-24 at 19:01 +0200, Jarkko Sakkinen wrote: > On Thu, Feb 23, 2017 at 06:46:18PM -0500, Mimi Zohar wrote: > > Commit 500462a9de65 "timers: Switch to a non-cascading wheel" replaced > > the 'classic' timer wheel, which aimed for near 'exact' expiry of the > > timers. Their analysis was that the vast majority of timeout timers > > are used as safeguards, not as real timers, and are cancelled or > > rearmed before expiration. The only exception noted to this were > > networking timers with a small expiry time. > > > > Not included in the analysis was the TPM polling timer, which resulted > > in a longer normal delay and, every so often, a very long delay. The > > non-cascading wheel delay is based on CONFIG_HZ. For a description of > > the different rings and their delays, refer to the comments in > > kernel/time/timer.c. > > > > Below are the delays given for rings 0 - 2, which explains the longer > > "normal" delays and the very, long delays as seen on systems with > > CONFIG_HZ 250. > > > > * HZ 1000 steps > > * Level Offset Granularity Range > > * 0 0 1 ms 0 ms - 63 ms > > * 1 64 8 ms 64 ms - 511 ms > > * 2 128 64 ms 512 ms - 4095 ms (512ms - ~4s) > > > > * HZ 250 > > * Level Offset Granularity Range > > * 0 0 4 ms 0 ms - 255 ms > > * 1 64 32 ms 256 ms - 2047 ms (256ms - ~2s) > > * 2 128 256 ms 2048 ms - 16383 ms (~2s - ~16s) > > > > Below is a comparison of extending the TPM with 1000 measurements, > > using msleep() vs. usleep_delay() when configured for 1000 hz vs. 250 > > hz, before and after commit 500462a9de65. > > > > linux-4.7 | msleep() usleep_range() > > 1000 hz: 0m44.628s | 1m34.497s 29.243s > > 250 hz: 1m28.510s | 4m49.269s 32.386s > > > > linux-4.7 | min-max (msleep) min-max (usleep_range) > > 1000 hz: 0:017 - 2:760s | 0:015 - 3:967s 0:014 - 0:418s > > 250 hz: 0:028 - 1:954s | 0:040 - 4:096s 0:016 - 0:816s > > > > This patch replaces the msleep() with usleep_range() calls in the > > i2c nuvoton driver with a consistent max range value. > > > > Signed-of-by: Mimi Zohar <zohar@xxxxxxxxxxxxxxxxxx> > > Reviewed-by: Nayna Jain <nayna@xxxxxxxxxxxxxxxxxx> > > So why doesn't it go to level 0 with msleep()? I quickly skimmed > through __mod_timer() and for me it looked like that level 0 would be > calculated (when it is eventually called starting from msleep()). > What did I miss? I've just added some printk's in kernel/time/timer.c. It looks like it is level 0. The delay seems to be caused by schedule() in schedule_timeout(). setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); __mod_timer(&timer, expire, false, false); schedule(); <=== del_singleshot_timer_sync(&timer); /* Remove the timer from the object tracker */ destroy_timer_on_stack(&timer); printks output: 124.901002] calc_wheel_index: level 0 timer: c000003fab32b150 expires 4294923520 new expires 4294923520 now 4294923518 [ 124.901003] __mod_timer: exit timer c000003fab32b1a0 now 4294923518 < call to schedule() > [ 128.607463] schedule_timeout: before destroy timer: c000003fab32b150 expires 4294923520 now 4294924439 <=== notice that the "now" time is way beyond the expires time. Mimi