Dear Sirs,
Thanks for the suggestion. And sorry for I didn't notice the upstream
code has already hooked to clocksource_register_hz() in csrc-r4k.c
(We're using r4000 clock source)
I'm afraid this still doesn't fix my case. Through
clocksource_register_hz()->__clocksource_register_scale()->__clocksource_updatefreq_scale,
I got a calculated maxsec = (0xffffffff - (0xffffffff>>5))/250000500 =
16 # assume mips_hpt_frequency=250000500
With this maxsec, I got a mult of 0xffffde72, still too big.
Hrmm. Yong Zang is right to suggest clocksource_register_hz(), as the
intention of that code is to try to avoid these sorts of issues.
What is the corresponding shift value you're getting for the value
above?
Could you annotate clocks_calc_mult_shift() a little bit to see where
things might be going wrong?
Let me give some real world data:
in one machine with 500MHz freq,
the calculated freq = 500084016, and clocks_calc_mult_shift() give
mult = 4294245725
shift = 30
but in the 1785th call to update_wall_time, due to error correction
algorithm, the mult become 4293964632,
in next update_wall_time, the ntp_error is 0x301c93b7927c, which lead to
an adj of 20, then mult is overflow:
mult = 4293964632 + (1<<20) = 45912
with this mult, if anyone call timekeeping_get_ns or others using mult,
the time concept will be extremely wrong, so some sleep will
(almost)never return => virtually hang
We are not abosulately sure that the error source is normal, but anyway
it is a possible for the code to overflow, and it will cause hang.
For this case, the timekeeping_bigadjust should be able to control adj
to a maximum of around 20 with the lookahead for any error. So if the
mult is chosen at shift = 29, then mult becomes 4294245725/2, it will
not be possible to be overflowed.
In short, choosing a mult close to 2^32 is dangerous. But I don't know
what's the best way to avoid it for general cases, because I don't know
how big error can be and the adj can be for different systems.
Regards
Yours
Fuxin Zhang
thanks
-john