Re: Spinlock wrong CPU problem

Microbit_Ubuntu <microbit@xxxxxxxxxxxxxxxxxxxxxx> · Wed, 08 Jul 2009 12:08:36 +1000

On Tue, 2009-07-07 at 17:45 +0200, Timm Korte wrote:
> I'm trying to understand a spinlog bug in a kernel module (device driver).
> I have a spinlock that is uses in the actual hardware interrupt handler
> as well as in a seperate kernel thread doing the real work via a work
> queue. The first one uses the spinlock with spin_lock() and
> spin_unlock(), while the thread uses spin_lock_irqsave() and
> spin_unlock_irqrestore().
> On rare occasions (can't reproduce on purpose), i get a spinlog debug
> message about wrong cpu on _raw_spin_unlock when from the kernel thread.
> 
> This is the source (for the kernel_thread) that runs into the problem:
> 
> static int my_irqthread_function(void *ptr) {
>   struct my_dev *mydev = ptr;
> 
>   daemonize(MY_NAME "%02x", mydev->mynum);
>   allow_signal(SIGTERM);
>   while (!wait_event_interruptible(mydev->irqthread_wait,
> atomic_read(&mydev->irqthread_pending_count))) {
>     do {
>       uint8_t my_irq_pending = 0;
>       unsigned long iflags;
> 
>       spin_lock_irqsave(&mydev->irq_pending_lock, iflags);
>       my_irq_pending = mydev->irq_pending;
>       mydev->irq_pending = 0;
>       spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);
> 
>       // handle irqs
>       if (my_irq_pending & INT_IPAC1) {
>          my_handle_interrupt(&mydev->mydev[IPAC1]);
>       }
> ...
>       // continue if the pending count still is != 0 after decrementing
>     } while (!atomic_dec_and_test(&mydev->irqthread_pending_count));
>   }
> 
>   mydev->irqthread = 0;
>   complete_and_exit(&mydev->irqthread_exit, 0);
> }
> 
> The error happens on the
> "spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);" - but i
> really can't figure out, how the thread could be moved to another cpu,
> while holding the lock and only doing two assignment operations.
> 
> The only thing i could think of, is that it might have something to do
> with the enabled sigterm signal - even though the module wasn't being
> unloaded at the time the bug occured.
> 
> System is FC4 based with a 2.6.17 kernel (can't change).
> 
> So I'm sort of out of ideas and hope someone here has an idea, what
> might have gone wrong here.
> 
> Timm
> 
> --
> To unsubscribe from this list: send an email with
> "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
> Please read the FAQ at http://kernelnewbies.org/FAQ
> 

Hallo Timm,

I'm just speculating, but I thought that when you work with an SMP
system, IRQs that are disabled on one CPU can still be 'handled' by
other CPUs, rather an asynchronous scenario.
Could it be that this is the cause of the problem you're observing ?
If so, I'm sure others here can help how to ensure a spinlock masks
_all_ CPUs in SMP.
(I'm pretty much an embedded HW/SW guy, not much of a PC guy ... :-)

HTH

-- 
Best regards,
Kris

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ