Microbit_Ubuntu schrieb: > On Tue, 2009-07-07 at 17:45 +0200, Timm Korte wrote: >> I'm trying to understand a spinlog bug in a kernel module (device driver). >> I have a spinlock that is uses in the actual hardware interrupt handler >> as well as in a seperate kernel thread doing the real work via a work >> queue. The first one uses the spinlock with spin_lock() and >> spin_unlock(), while the thread uses spin_lock_irqsave() and >> spin_unlock_irqrestore(). >> On rare occasions (can't reproduce on purpose), i get a spinlog debug >> message about wrong cpu on _raw_spin_unlock when from the kernel thread. >> >> This is the source (for the kernel_thread) that runs into the problem: >> >> static int my_irqthread_function(void *ptr) { >> struct my_dev *mydev = ptr; >> >> daemonize(MY_NAME "%02x", mydev->mynum); >> allow_signal(SIGTERM); >> while (!wait_event_interruptible(mydev->irqthread_wait, >> atomic_read(&mydev->irqthread_pending_count))) { >> do { >> uint8_t my_irq_pending = 0; >> unsigned long iflags; >> >> spin_lock_irqsave(&mydev->irq_pending_lock, iflags); >> my_irq_pending = mydev->irq_pending; >> mydev->irq_pending = 0; >> spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags); >> >> // handle irqs >> if (my_irq_pending & INT_IPAC1) { >> my_handle_interrupt(&mydev->mydev[IPAC1]); >> } >> ... >> // continue if the pending count still is != 0 after decrementing >> } while (!atomic_dec_and_test(&mydev->irqthread_pending_count)); >> } >> >> mydev->irqthread = 0; >> complete_and_exit(&mydev->irqthread_exit, 0); >> } >> >> The error happens on the >> "spin_unlock_irqrestore(&mydev->irq_pending_lock, iflags);" - but i >> really can't figure out, how the thread could be moved to another cpu, >> while holding the lock and only doing two assignment operations. >> >> The only thing i could think of, is that it might have something to do >> with the enabled sigterm signal - even though the module wasn't being >> unloaded at the time the bug occured. >> >> System is FC4 based with a 2.6.17 kernel (can't change). >> >> So I'm sort of out of ideas and hope someone here has an idea, what >> might have gone wrong here. >> >> Timm >> > > Hallo Timm, > > I'm just speculating, but I thought that when you work with an SMP > system, IRQs that are disabled on one CPU can still be 'handled' by > other CPUs, rather an asynchronous scenario. > Could it be that this is the cause of the problem you're observing ? > If so, I'm sure others here can help how to ensure a spinlock masks > _all_ CPUs in SMP. > (I'm pretty much an embedded HW/SW guy, not much of a PC guy ... :-) > > HTH > I thought about that, too - but what reason would there be for the thread to jump to another cpu, just because that other cpu just got an interrupt - instead of just keep running on the one it's already on? Timm -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ