On Tue, Mar 19, 2019 at 10:04:08PM -0700, Darrick J. Wong wrote: > Hmmm. > > Every now and then I see a generic/475 deadlock that generates the > hangcheck warning pasted below. > > I /think/ this is ... the ail is processing an inode log item, for which > it locked the cluster buffer and pushed the cil to unpin the buffer. > However, the cil is cleaning up after the shut down and is trying to > simulate an EIO completion, but tries grabs the buffer lock and hence > the cil and ail deadlock. Maybe the solution is to trylock in the > (freed && remove) case of xfs_buf_item_unpin, since we're tearing the > whole system down anyway? Oh, that's looks like a bug in xfs_iflush() - we are forcing the log to unpin a buffer we already own the lock on. It's the same problem we had in the discard code fixed by commit 8c81dd46ef3c ("Force log to disk before reading the AGF during a fstrim"). It also means that the log forces in the busy extent code have the same potential problem, as does xfs_qm_dqflush(). I'll move further down the discussion now.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx