Re: generic/475 deadlock?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 19, 2019 at 10:04:08PM -0700, Darrick J. Wong wrote:
> Hmmm.
> 
> Every now and then I see a generic/475 deadlock that generates the
> hangcheck warning pasted below.
> 
> I /think/ this is ... the ail is processing an inode log item, for which
> it locked the cluster buffer and pushed the cil to unpin the buffer.
> However, the cil is cleaning up after the shut down and is trying to
> simulate an EIO completion, but tries grabs the buffer lock and hence
> the cil and ail deadlock.  Maybe the solution is to trylock in the
> (freed && remove) case of xfs_buf_item_unpin, since we're tearing the
> whole system down anyway?

Oh, that's looks like a bug in xfs_iflush() - we are forcing the log
to unpin a buffer we already own the lock on. It's the same problem
we had in the discard code fixed by commit 8c81dd46ef3c ("Force log
to disk before reading the AGF during a fstrim").

It also means that the log forces in the busy extent code have the
same potential problem, as does xfs_qm_dqflush().

I'll move further down the discussion now....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux