Hello, Dave. On Sat, Jan 10, 2015 at 10:28:15AM +1100, Dave Chinner wrote: > process A kworker (1..N) > ilock(excl) > alloc > queue work(allocwq) > (work queued as no kworker > threads available) > execute work from xfsbuf-wq > xfs_end_io > ilock(excl) > (blocks waiting on queued work) > > No new kworkers are started, so the queue never makes progress, > we deadlock. But allocwq is a separate workqueue from xfsbuf-wq and should have its own rescuer. The work item queued by process on A is guaranteed to make forward progress no matter what work items on xfsbuf-wq are doing. The deadlock as depicted above cannot happen. A workqueue with WQ_MEM_RECLAIM can deadlock iff an executing work item on the workqueue deadlocks. > AFAICT, the only way we can get here is that we have N idle > kworkers, and N+M works get queued where the allocwq work is at the > tail end of the queue. This results in newly queued work is not > kicking a new kworker threadi as there are idle threads, and as > works are executed the are all for the xfsbuf-wq and blocking on the > ilock(excl). The workqueue doesn't work that way. Forward progress guarantee is separate for each workqueue with WQ_MEM_RECLAIM set. Each is guaranteed to make forward progress indepdently of what others are doing. > We eventually get to the point where there are no more idle > kworkers, but we still have works queued, and progress is still > dependent the queued works completing.... > > This is actually not an uncommon queuing occurrence, because we > can get storms of end-io works queued from batched IO completion > processing. And all these have nothing to with why the deadlock is happening. > > Ummm, this feel pretty voodoo. In practice, it'd change the order of > > things being executed and may make certain deadlocks unlikely enough, > > but I don't think this can be a proper fix. > > Right, that's why Eric approached about this a few weeks ago asking > whether it could be fixed in the workqueue code. > > As I've said before (in years gone by), we've got multiple levels of > priority needed for executing XFS works because of lock ordering > requirements. We *always* want the allocation workqueue work to run There's no problem there - create a separate WQ_MEM_RECLAIM workqueue at each priority level. Each is guaranteed to make forward progress and as long as the dependency graph isn't looped, the whole thing is guaranteed to make forward progress. > before the end-io processing of the xfsbuf-wq and unwritten-wq > because of this lock inversion, just like we we always want the > xfsbufd to run before the unwritten-wq because unwritten extent > conversion may block waiting for metadata buffer IO to complete, and > we always want the the xfslog-wq works to run before all of them > because metadata buffer IO may get blocked waiting for buffers > pinned by the log to be unpinned for log Io completion... I'm not really following your logic here. Are you saying that xfs is trying to work around cyclic dependency by manipulating execution order of specific work items? > We solve these dependencies in a sane manner with a single high > priority workqueue level, so we're stuck with hacking around the > worst of the problems for the moment. I'm afraid I'm lost. Sans bugs in the rescuer logic, each workqueue with WQ_MEM_RECLAIM guarantees forward progress of a single work item at any given time. If there are dependencies among the work items, each node of the dependency graph should correspond to a separate workqueue. As long as the dependency graph isn't cyclic, the whole is guaranteed to make forward progress. There no reason to play with priorities to avoid deadlock. That doesn't make any sense to me. Priority or chained queueing, which is a lot better way to do it, can be used to optimize queueing and execution behavior if one set of work items are likely to wait on another set, but that should have *NOTHING* to do with deadlock avoidance. If the dependency graph is cyclic, it will deadlock. If not, it won't. It's as simple as that. Thanks. -- tejun _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs