Kernel timer migration and its effects

Shaw <shawvrana@xxxxxxxxx> · Thu, 15 Dec 2005 10:27:12 -0800

Hello All,

I'm hitting a BUG call in the cascade() code of the timer.c code.  In
order to review some code for timer handling errors, possible race
conditions, etc. I've been trying to understand the implementation of
timers in the kernel.  Specifically, in the lock_timer_base() function
of Linus' tree:

 for (;;) {
        base = timer->base;
        if (likely(base != NULL)) {
            spin_lock_irqsave(&base->lock, *flags);
            if (likely(base == timer->base))
                return base;
            /* The timer has migrated to another CPU */
            spin_unlock_irqrestore(&base->lock, *flags);
        }
       cpu_relax();
}

We check that the base is equal to the timer base after the spin lock
because, I suppose, the timer could have been changed with mod_timer
or by some other means while we were spinning on the lock.  Doing so
could have caused the timer to migrate to another CPU.

Why would we continue looping if the timer had migrated to another
CPU?  The goal of the code is to get a lock on the base of the timer
parameter.  How will looping help us here?  Are we just waiting until
the pending timer on the other CPU runs and its base is set to NULL? 
In the case of this function being called from del_timer_sync, I would
think that a potentially long spinning time would not be acceptable? 
What am I missing?

Thanks,
Shaw

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/