On Sat, Jan 10, 2015 at 12:41:33PM -0500, Tejun Heo wrote: > Hello, Dave. > > On Sat, Jan 10, 2015 at 10:28:15AM +1100, Dave Chinner wrote: > > process A kworker (1..N) > > ilock(excl) > > alloc > > queue work(allocwq) > > (work queued as no kworker > > threads available) > > execute work from xfsbuf-wq > > xfs_end_io > > ilock(excl) > > (blocks waiting on queued work) > > > > No new kworkers are started, so the queue never makes progress, > > we deadlock. > > But allocwq is a separate workqueue from xfsbuf-wq and should have its > own rescuer. The work item queued by process on A is guaranteed to > make forward progress no matter what work items on xfsbuf-wq are > doing. The deadlock as depicted above cannot happen. A workqueue > with WQ_MEM_RECLAIM can deadlock iff an executing work item on the > workqueue deadlocks. Eric will have to confirm, but I recall asking Eric to check the recuer threads and that they were idle... .... > > before the end-io processing of the xfsbuf-wq and unwritten-wq > > because of this lock inversion, just like we we always want the > > xfsbufd to run before the unwritten-wq because unwritten extent > > conversion may block waiting for metadata buffer IO to complete, and > > we always want the the xfslog-wq works to run before all of them > > because metadata buffer IO may get blocked waiting for buffers > > pinned by the log to be unpinned for log Io completion... > > I'm not really following your logic here. Are you saying that xfs is > trying to work around cyclic dependency by manipulating execution > order of specific work items? No, it's not cyclic. They are different dependencies. Data IO completion can take the XFS inode i_lock. i.e. in the mp->m_data_workqueue and the mp->m_unwritten_workqueue. mp->m_data_workqueue has no other dependencies. mp->m_unwritten_workqueue reads buffers, so is dependent on mp->m_buf_workqueue for buffer IO completion. mp->m_unwritten_workqueue can cause btree splits, which can defer work to the xfs_alloc_wq. xfs_alloc_wq reads buffers, so it dependent on the mp->m_buf_workqueue for buffer IO completion. So lock/wq ordering dependencies are: m_data_workqueue -> i_lock m_unwritten_workqueue -> i_lock -> xfs_alloc_wq -> m_buf_workqueue syscall -> i_lock -> xfs_alloc_wq -> m_buf_workqueue The issue we see is: process A: write(2) -> i_lock -> xfs_allow_wq kworkers: m_data_workqueue -> i_lock (blocked on process A work completion) Queued work: m_data_workqueue work, xfs_allow_wq work Queued work does not appear to be dispatched for some reason, wq concurrency depth does not appear to be exhausted and rescuer threads do not appear to be active. Something has gone wrong for the queued work to be stalled like this. > There no reason to play with priorities to avoid deadlock. That > doesn't make any sense to me. Priority or chained queueing, which is Prioritised work queuing is what I suggested, not modifying kworker scheduler priorities... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs