On Wed, 15 Sept 2021 at 10:57, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Tue, Sep 14 2021 at 20:00, Dmitry Vyukov wrote: > > On Tue, 14 Sept 2021 at 16:58, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >> Now what happens when the mac80211 callback rearms the timer so it > >> expires immediately again: > >> > >> hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer), > >> ns_to_ktime(bcn_int * NSEC_PER_USEC)); > >> > >> bcn is a user space controlled value. Now lets assume that bcn_int is <=1, > >> which would certainly cause the loop in hrtimer_run_queues() to keeping > >> looping forever. > >> > >> That should be easy to verify by implementing a simple test which > >> reschedules a hrtimer from the callback with a expiry time close to now. > >> > >> Not today as I'm about to head home to fire up the pizza oven. > > > > This question definitely shouldn't take priority over the pizza. But I > > think I saw this "rearm a timer with a user-controlled value without > > any checks" pattern lots of times and hangs are inherently harder to > > localize and reproduce. So I wonder if it makes sense to add a debug > > config that would catch such cases right when the timer is set up > > (issue a WARNING)? > > Yes and no. It's hard to differentiate between a valid short expiry > rearm and something which is caused by unchecked values. I have some > ideas but all of them are expensive and therefore probably debug > only. Which is actually better than nothing :) > > > However, for automated testing there is the usual question of > > balancing between false positives and false negatives. The check > > should not produce false positives, but at the same time it should > > catch [almost] all actual stalls so that they don't manifest as > > duplicate stall reports. > > Right. The problem could be even there with checked values: > > start_timer(1ms) > timer_expires() > callback() > forward_timer(timer, now, period(1ms)); > > which might be perfectly fine with a production kernel as it leaves > enough time to make overall progress. > > Now with a full debug kernel with all bells and whistels that callback > might just run into this situation: > > start_timer(1ms) T0 > timer_expires() T1 > callback() > do_stuff() > forward_timer(timer, TNOW, period(1ms)); > > > T1 - T0 = 1.001ms > TNOW - T1 = 0.998 ms > > So the forward will just rearm it to T0 + 2ms which means it expires in > 1us. > > > If I understand it correctly the timer is not actually set up as > > periodic, but rather each callback invocation arms it again. Setting > > up a timer for 1 ns _once_ (or few times) is probably fine (right?), > > so the check needs to be somewhat more elaborate and detect "infinite" > > rearming. > > Yes. > > That made me actually look at that mac80211_hwsim callback again. > > hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer), > ns_to_ktime(bcn_int * NSEC_PER_USEC)); > > So what this does is really wrong because it tries to schedule the timer > on the theoretical periodic timeline. Which goes really south once the > timer is late or the callback execution took longer than the > period. Hypervisors scheduling out a VCPU at the wrong place will do > that for you nicely. Nice! You mentioned that hrtimer_run_queues() may not return. Does it mean that it can just loop executing the same re-armed callback again and again? Maybe then the debug check condition should be that hrtimer_run_queues() runs the same callback more than N times w/o returning? > What this actually should use is hrtimer_forward_now() which prevents > that problem because it will forward the timer in the periodic schedule > beyond now. That won't prevent the above corner case, but I doubt you > can create an endless loop with that scenario as easy as you can with > trying to catch up on your theoretical timeline by using the previous > expiry time as a base for the forward. Patch below. > > /me goes off to audit hrtimer_forward() usage. Sigh... > > After that figure out ways to debug or even prevent this. More sigh... > > Thanks, > > tglx > --- > drivers/net/wireless/mac80211_hwsim.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > --- a/drivers/net/wireless/mac80211_hwsim.c > +++ b/drivers/net/wireless/mac80211_hwsim.c > @@ -1867,8 +1867,8 @@ mac80211_hwsim_beacon(struct hrtimer *ti > bcn_int -= data->bcn_delta; > data->bcn_delta = 0; > } > - hrtimer_forward(&data->beacon_timer, hrtimer_get_expires(timer), > - ns_to_ktime(bcn_int * NSEC_PER_USEC)); > + hrtimer_forward_now(&data->beacon_timer, > + ns_to_ktime(bcn_int * NSEC_PER_USEC)); > return HRTIMER_RESTART; > } >