Re: Spinlock wrong CPU problem

Timm Korte <korte-kernelnewbies@xxxxxxxxxxxx> · Tue, 07 Jul 2009 22:46:12 +0200

Microbit_Ubuntu schrieb:
> On Tue, 2009-07-07 at 17:45 +0200, Timm Korte wrote:
>> I'm trying to understand a spinlog bug in a kernel module (device driver).
>> I have a spinlock that is uses in the actual hardware interrupt handler
>> as well as in a seperate kernel thread doing the real work via a work
>> queue. The first one uses the spinlock with spin_lock() and
>> spin_unlock(), while the thread uses spin_lock_irqsave() and
>> spin_unlock_irqrestore().
>> On rare occasions (can't reproduce on purpose), i get a spinlog debug
>> message about wrong cpu on _raw_spin_unlock when from the kernel thread.
>>
>> This is the source (for the kernel_thread) that runs into the problem:
>>
>> static int my_irqthread_function(void *ptr) {
>>   struct my_dev *mydev = ptr;
>>
>>   daemonize(MY_NAME "%02x", mydev->mynum);
>>   allow_signal(SIGTERM);
>>   while (!wait_event_interruptible(mydev->irqthread_wait,
>> atomic_read(&mydev->irqthread_pending_count))) {
>>     do {
>>       uint8_t my_irq_pending = 0;
>>       unsigned long iflags;
>>
>>       spin_lock_irqsave(&mydev->irq_pending_lock, iflags);
>>       my_irq_pending = mydev->irq_pending;
>>       mydev->irq_pending = 0;
>>       spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);
>>
>>       // handle irqs
>>       if (my_irq_pending & INT_IPAC1) {
>>          my_handle_interrupt(&mydev->mydev[IPAC1]);
>>       }
>> ...
>>       // continue if the pending count still is != 0 after decrementing
>>     } while (!atomic_dec_and_test(&mydev->irqthread_pending_count));
>>   }
>>
>>   mydev->irqthread = 0;
>>   complete_and_exit(&mydev->irqthread_exit, 0);
>> }
>>
>> The error happens on the
>> "spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);" - but i
>> really can't figure out, how the thread could be moved to another cpu,
>> while holding the lock and only doing two assignment operations.
>>
>> The only thing i could think of, is that it might have something to do
>> with the enabled sigterm signal - even though the module wasn't being
>> unloaded at the time the bug occured.
>>
>> System is FC4 based with a 2.6.17 kernel (can't change).
>>
>> So I'm sort of out of ideas and hope someone here has an idea, what
>> might have gone wrong here.
>>
>> Timm
>>
> 
> Hallo Timm,
> 
> I'm just speculating, but I thought that when you work with an SMP
> system, IRQs that are disabled on one CPU can still be 'handled' by
> other CPUs, rather an asynchronous scenario.
> Could it be that this is the cause of the problem you're observing ?
> If so, I'm sure others here can help how to ensure a spinlock masks
> _all_ CPUs in SMP.
> (I'm pretty much an embedded HW/SW guy, not much of a PC guy ... :-)
> 
> HTH
> 

I thought about that, too - but what reason would there be for the
thread to jump to another cpu, just because that other cpu just got an
interrupt - instead of just keep running on the one it's already on?

Timm

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ