On Mon, Jan 04, 2021 at 03:27:14PM -0500, Brian Foster wrote: > On Mon, Jan 04, 2021 at 11:44:37AM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > When overlayfs is running on top of xfs and the user unlinks a file in > > the overlay, overlayfs will create a whiteout inode and ask xfs to > > "rename" the whiteout file atop the one being unlinked. If the file > > being unlinked loses its one nlink, we then have to put the inode on the > > unlinked list. > > > > This requires us to grab the AGI buffer of the whiteout inode to take it > > off the unlinked list (which is where whiteouts are created) and to grab > > the AGI buffer of the file being deleted. If the whiteout was created > > in a higher numbered AG than the file being deleted, we'll lock the AGIs > > in the wrong order and deadlock. > > > > Therefore, grab all the AGI locks we think we'll need ahead of time, and > > in the correct order. > > > > Reported-by: wenli xie <wlxie7296@xxxxxxxxx> > > Tested-by: wenli xie <wlxie7296@xxxxxxxxx> > > Fixes: 93597ae8dac0 ("xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()") > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > --- > > fs/xfs/xfs_inode.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 46 insertions(+) > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > > index b7352bc4c815..dd419a1bc6ba 100644 > > --- a/fs/xfs/xfs_inode.c > > +++ b/fs/xfs/xfs_inode.c > > @@ -3000,6 +3000,48 @@ xfs_rename_alloc_whiteout( > > return 0; > > } > > > > +/* > > + * For the general case of renaming files, lock all the AGI buffers we need to > > + * handle bumping the nlink of the whiteout inode off the unlinked list and to > > + * handle dropping the nlink of the target inode. We have to do this in > > + * increasing AG order to avoid deadlocks. > > + */ > > +static int > > +xfs_rename_lock_agis( > > + struct xfs_trans *tp, > > + struct xfs_inode *wip, > > + struct xfs_inode *target_ip) > > +{ > > + struct xfs_mount *mp = tp->t_mountp; > > + struct xfs_buf *bp; > > + xfs_agnumber_t agi_locks[2] = { NULLAGNUMBER, NULLAGNUMBER }; > > + int error; > > + > > + if (wip) > > + agi_locks[0] = XFS_INO_TO_AGNO(mp, wip->i_ino); > > + > > + if (target_ip && VFS_I(target_ip)->i_nlink == 1) > > + agi_locks[1] = XFS_INO_TO_AGNO(mp, target_ip->i_ino); > > + > > + if (agi_locks[0] != NULLAGNUMBER && agi_locks[1] != NULLAGNUMBER && > > + agi_locks[0] > agi_locks[1]) > > + swap(agi_locks[0], agi_locks[1]); > > + > > + if (agi_locks[0] != NULLAGNUMBER) { > > + error = xfs_read_agi(mp, tp, agi_locks[0], &bp); > > + if (error) > > + return error; > > + } > > + > > + if (agi_locks[1] != NULLAGNUMBER) { > > + error = xfs_read_agi(mp, tp, agi_locks[1], &bp); > > + if (error) > > + return error; > > + } > > + > > + return 0; > > +} > > This all looks reasonable to me, but I wonder if we can simplify > a bit by reusing the sorted inodes array we've already created earlier > in xfs_rename(). E.g., something like: > > for (i = 0; i < num_inodes; i++) { > if (inodes[i] != wip && inodes[i] != target_ip) > continue; > error = xfs_read_agi(...); > ... > } > > IOW, similar to how xfs_lock_inodes() and xfs_qm_vop_rename_dqattach() > work. I think it would be difficult to do that because we only need to grab target_ip's AGI if we're going to droplink it, and we haven't yet taken target_ip's ILOCK when we invoke the sorting hat so the link count isn't stable. --D > Brian > > > + > > /* > > * xfs_rename > > */ > > @@ -3130,6 +3172,10 @@ xfs_rename( > > } > > } > > > > + error = xfs_rename_lock_agis(tp, wip, target_ip); > > + if (error) > > + return error; > > + > > /* > > * Directory entry creation below may acquire the AGF. Remove > > * the whiteout from the unlinked list first to preserve correct > > >