On Tue, Sep 07, 2010 at 02:26:54PM +0200, Tejun Heo wrote: > On 09/07/2010 12:35 PM, Tejun Heo wrote: > > Can you please help me a bit more? Are you saying the following? > > > > Work w0 starts execution on wq0. w0 tries locking but fails. Does > > delay(1) and requeues itself on wq0 hoping another work w1 would be > > queued on wq0 which will release the lock. The requeueing should make > > w0 queued and executed after w1, but instead w1 never gets executed > > while w0 hogs the CPU constantly by re-executing itself. Also, how > > does delay(1) help with chewing up CPU? Are you talking about > > avoiding constant lock/unlock ops starving other lockers? In such > > case, wouldn't cpu_relax() make more sense? > > Ooh, almost forgot. There was nr_active underflow bug in workqueue > code which could lead to malfunctioning max_active regulation and > problems during queue freezing, so you could be hitting that too. I > sent out pull request some time ago but hasn't been pulled into > mainline yet. Can you please pull from the following branch and add > WQ_HIGHPRI as discussed before and see whether the problem is still > reproducible? I'm currently running with the WQ_HIGHPRI flag. I only change one thing at a time so I can tell what caused the change in behaviour... > And if the problem is reproducible, can you please > trigger sysrq thread dump and attach it? Well, most of the time the system is 100% unresponsive when the livelock occurs, so I'll be lucky to get anything at all.... > git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-linus I'll try that next if the probelm still persists. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs