Hello, On Sat, Jan 10, 2015 at 06:04:30PM -0600, Eric Sandeen wrote: > > The only reasons that work item would stay there are > > > > * The rescuer is already executing something else from that workqueue > > and that one is stuck. > > I'll have to look at that. I hope I still have access to the core... Yes, if this is happening, the rescuer worker which has the name of the workqueue would be stuck somewhere. > > * The worker pool is still considered to be making forward progress - > > there's a worker which isn't blocked and can burn CPU cycles. > > AFAICT, the first thing in the pool is the xffs_end_io blocked waiting for the ilock. > > I assume it's only the first one that matters? Whatever work item which is executing on that pool on that CPU. Checking the tasks which are runnable on that CPU should show it. > > Again, if xfs is using workqueue correctly, that work item shouldn't > > get stuck at all. What other workqueues are doing is irrelevant. > > and yet here we are; one of us must be missing something. It's quite > possibly me :) but we definitely have this thing wedged, and moving > the xfsalloc item to the front via high priority did solve it. Not saying > it's the right solution, just a data point. It sure is possible that workqueue is misbehaving but I'm pretty doubtful that it'd be, especially given that xfs issue has been around for quite a while, which excludes recent regressions in the rescuer logic, and that there hasn't been any other case of failed forward progress guarantee. Thanks. -- tejun _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs