On Mon, Mar 18, 2013 at 01:08:07PM -0400, Steven Rostedt wrote: > On Mon, 2013-03-18 at 09:43 -0700, Tejun Heo wrote: > > > Making gcwq locks disable preemption would be much safer / easier, but > > if that's not desirable, anything touching gcwq->idle_list would be a > > good place to start - worker_enter_idle() and worker_leave_idle(). > > Hmmm... ignoring CPU hotplug, I think those two might just do it. > > Give it a try? How reproducible is the problem? > > Not very :-( I triggered it twice on a 40 CPU box. It can go > approximately 1 month before it triggers. And the box we are testing on > is currently a loaner, and we have it on extension right now. Which > means we wont have it much longer. > > But perhaps that's the place to fix things. I've been thinking about it and AFAICS the only way that BUG_ON() could trigger from preemption is if preemption happens while the idle_list head is becoming or stopping being empty. ie. pool->worklist is half updated so list_empty() isn't true but the first next entry is already pointing back to itself. If there's a crashdump, it shouldn't be too difficult to verify and wrapping the above two functions should resolve it. Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html