From: Dave Chinner <dchinner@xxxxxxxxxx> truncate_setsize() removes pages from the page cache, and hence requires page locks to be held. It is not valid to lock a page cache page inside a transaction context as we can hold page locks when we we reserve space for a transaction. If we do, then we expose an ABBA deadlock between log space reservation and page locks. That is, both the write path and writeback lock a page, then start a transaction for block allocation, which means they can block waiting for a log reservation with the page lock held. If we hold a log reservation and then do something that locks a page (e.g. truncate_setsize in xfs_setattr_size) then that page lock can block on the page locked and waiting for a log reservation. If the transaction that is waiting for the page lock is the only active transaction in the system that can free log space via a commit, then writeback will never make progress and so log space will never free up. This issue with xfs_setattr_size() was introduced back in 2010 by commit fa9b227 ("xfs: new truncate sequence") which moved the page cache truncate from outside the transaction context (what was xfs_itruncate_data()) to inside the transaction context as a call to truncate_setsize(). The reason truncate_setsize() was located where in this place was that we can't change the file size until after we are in the transaction context and the operation will either succeed or shut down the filesystem on failure. Hence we have to split truncate_setsize() back into a pagecache operation that occurs before the transaction context, and a i_size_write() call that happens within the transaction context. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/xfs_iops.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index ef1ca01..ab2dc47 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -808,22 +808,27 @@ xfs_setattr_size( */ inode_dio_wait(inode); + /* + * Do all the page cache truncate work outside the transaction + * context as the "lock" order is page lock->log space reservation. + * i.e. locking pages inside the transaction can ABBA deadlock with + * writeback. We have to do the inode size update inside the + * transaction, however, as xfs_trans_reserve() can fail with ENOMEM + * and we can't make user visible changes on non-fatal errors. + */ error = -block_truncate_page(inode->i_mapping, newsize, xfs_get_blocks); if (error) return error; + truncate_pagecache(inode, newsize); tp = xfs_trans_alloc(mp, XFS_TRANS_SETATTR_SIZE); error = xfs_trans_reserve(tp, &M_RES(mp)->tr_itruncate, 0, 0); if (error) goto out_trans_cancel; - truncate_setsize(inode, newsize); - commit_flags = XFS_TRANS_RELEASE_LOG_RES; lock_flags |= XFS_ILOCK_EXCL; - xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, 0); /* @@ -856,6 +861,7 @@ xfs_setattr_size( * they get written to. */ ip->i_d.di_size = newsize; + i_size_write(inode, newsize); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); if (newsize <= oldsize) { _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs