Dave Chinner <david@xxxxxxxxxxxxx> 于2018年12月18日周二 上午7:33写道: > > On Sat, Dec 15, 2018 at 01:34:33PM +0800, 张本龙 wrote: > > Hi Developpers and XFS, > > > > There seems to be a deadlock involving 3 threads: 1) the fsync thread > > has acquired the project quota lock, and is trying to get the xfs_buf > > (it's a an agf); 2) the xfs_buf is attached to a transaction, and > > xfs_end_io is trying to get the xfs_inode ilock; 3) the write thread > > has acquired the xfs_inode ilock, and tries to get the xfs_dquot. > > Below are the traces. > > I don't see a deadlock here. What's holding the AGF lock and > preventing progress from being made? > Oh, I was thinking the AGF is attached to a transaction. So between xfs_trans_bjoin() and xfs_trans_commit(), a buf cannot be used by others right? Then it should be released by xfs_end_io() in xfs_trans_commit(), and the deadlock is like: Thread 1 2 3 fsync() dqlock P agf lock <blocks> xfs_end_io (agf locked by transaction) ilock A <blocks> unlock agf in trans commit write() ilock A dqlock P <blocks> > i.e. we have: > > process 1 2 3 > fsync() > ilock A > dqlock P > agf lock > <blocks> > xfs_end_io > ilock A > <blocks> > write() > ilock B > dqlock P > <blocks> > > So, basically, everyhting is waiting for the AGF lock, which > is held by something other than these three threads. When the AGF > lock is relesaed, the fsync() will make progress, then release > both dqlock P and ilock A and the other two threads will make > progress again. > Absolutely possible. fsync() indeed acquired ilock in xfs_iomap_write_allocate(). Yes the key is who holds AGF in this scenario. But in either guess it seems the AGF is being held by a transaction, blocked in xfs_end_io() by ilock. > > It's 3.10.0-514.16.1.el7.x86_64 kernel, met about 10-20 times a week > > on several hundred of servers. > > That's not a mainline kernel and so we can't really help diagnose > this much further. You should really report it to your distro > support channels so they can help you with further diagnosis and > fixes... Oh oh sure, thank you for pointing out. We don't have support channels since we use CentOS... I really appreciate your reply! > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx