Hi Folks, I'm currently making my living working with a system derived from RHEL 6.1, complete with proprietary kernel modules. That's based on the 2.6.32 kernel, which is, of course, old news in the linux world. I recently encountered and bandaided a rather nasty semi-deadlock, that was being aggravated by some of the proprietary code - but which appears to me (so far only from reading source code) to be present in current versions of the linux kernel. It is, however, hard to tickle even with our proprietary module, and doubtless worse without. The problem, in a nutshell, is that whenever the tasklist_lock is held for read there's a tiny chance that jiffies won't advance - increasing the longer the lock is held - and there's code running in softirqs (potentially interrupting a holder of the tasklist_lock) that relies on jiffies as a method of limiting how much work it does before allowing the interrupted task to proceed. The mechanism is quite straightforward, at least with the combination of timer options we're using - which as far as I know now are probably default RHEL for x86_64. It appears that the job of updating jiffies is assigned to one particular core. Jiffies won't advance while that core has local interrupts blocked. Unfortunately the spinlock routine used to get the tasklist_lock in write mode leaves interrupts blocked while spinning. So all we need to get in trouble are 2 cpus. A gets the tasklist_lock in read mode, and is then interrupted, and starts running code that uses jiffies to limit its run time, and happens to have so much work available that only the time limit is likely to stop it. B is in charge of updating jiffies; it runs a thread that calls write_lock_irq(&tasklist_lock). The next thing that happens is that whatever watchdog mechanisms are enabled (and don't use jiffies ;-)) proceed to go off - in our case 60 seconds later. What I'm asking here (other than feedback on the explanation, if folks have any) is whether there's any point taking the issue to LKML. The modality there seems to be to communicate via suggested patches to real bugs someone's actually encountering on current kernels. And there's an understandable aversion to patches submitted by unknowns which touch fundamental core kernel mechanisms. Also, judging by the reaction of my coworkers to my suggestions for a real fix, the linux kernel is seen as very fragile - likely to have code relying on unintended behaviors, so that a change that's theoretically correct may expose all kinds of nastiness. With a little work, I can determine empirically whether the issue is potentially present on current kernels. But there's no way I have the machine farm to make this happen replicably on a standard kernel. Our QA team managed to replicate this 3 times in 8 months, with the help of some proprietary code using the same idiom of limiting its run time via jiffies and running in softirq context. And any work I do on this would be on my own time - management is happy with the proprietary code being changed to use a different technique to limit its run time. The fix I'm inclined to propose is to the reader-writer spinlock code - re-enable interrupts while spinning. Of all the fixes we considered, this is the only one that doesn't potentially cost performance or latency - the only extra work done is while already busy-waiting. It's not a general fix for all RW spinlocks - the tasklist_lock has the useful property that no one ever tries to get it for write with interrupts already blocked. (They use write_lock_irq(), and never write_lock_irqsave()) But it is general and targetted at the mechanism, not the symptom. Thanks for any advice, --- Arlie (Arlie Stephens arlie@xxxxxxxxxxxx) _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies