On Mon, Sep 28, 2020 at 04:10:46PM +1000, Dave Chinner wrote: > On Sun, Sep 27, 2020 at 04:41:56PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > In xfs_bui_item_recover, there exists a use-after-free bug with regards > > to the inode that is involved in the bmap replay operation. If the > > mapping operation does not complete, we call xfs_bmap_unmap_extent to > > create a deferred op to finish the unmapping work, and we retain a > > pointer to the incore inode. > > > > Unfortunately, the very next thing we do is commit the transaction and > > drop the inode. If reclaim tears down the inode before we try to finish > > the defer ops, we dereference garbage and blow up. Therefore, create a > > way to join inodes to the defer ops freezer so that we can maintain the > > xfs_inode reference until we're done with the inode. > > Honest first reaction now I understand what the capture stuff is > doing: Ewww! Gross! Yes, the whole thing is gross. Honestly, I wish I could go back in time to 2016 to warn myself that we would need a way to reassemble entire runtime transactions + dfops chains so that we could avoid all this. > We only need to store a single inode, so the whole "2 inodes for > symmetry with defer_ops" greatly overcomplicates the code. This > could be *much* simpler. Indeed, see my comment at the very end. > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > index deb99300d171..c7f65e16534f 100644 > > --- a/fs/xfs/xfs_icache.c > > +++ b/fs/xfs/xfs_icache.c > > @@ -12,6 +12,7 @@ > > #include "xfs_sb.h" > > #include "xfs_mount.h" > > #include "xfs_inode.h" > > +#include "xfs_defer.h" > > #include "xfs_trans.h" > > #include "xfs_trans_priv.h" > > #include "xfs_inode_item.h" > > @@ -1689,3 +1690,43 @@ xfs_start_block_reaping( > > xfs_queue_eofblocks(mp); > > xfs_queue_cowblocks(mp); > > } > > + > > +/* > > + * Prepare the inodes to participate in further log intent item recovery. > > + * For now, that means attaching dquots and locking them, since libxfs doesn't > > + * know how to do that. > > + */ > > +void > > +xfs_defer_continue_inodes( > > + struct xfs_defer_capture *dfc, > > + struct xfs_trans *tp) > > +{ > > + int i; > > + int error; > > + > > + for (i = 0; i < XFS_DEFER_OPS_NR_INODES && dfc->dfc_inodes[i]; i++) { > > + error = xfs_qm_dqattach(dfc->dfc_inodes[i]); > > + if (error) > > + tp->t_mountp->m_qflags &= ~XFS_ALL_QUOTA_CHKD; > > + } > > + > > + if (dfc->dfc_inodes[1]) > > + xfs_lock_two_inodes(dfc->dfc_inodes[0], XFS_ILOCK_EXCL, > > + dfc->dfc_inodes[1], XFS_ILOCK_EXCL); > > + else if (dfc->dfc_inodes[0]) > > + xfs_ilock(dfc->dfc_inodes[0], XFS_ILOCK_EXCL); > > + dfc->dfc_ilocked = true; > > +} > > + > > +/* Release all the inodes attached to this dfops capture device. */ > > +void > > +xfs_defer_capture_irele( > > + struct xfs_defer_capture *dfc) > > +{ > > + unsigned int i; > > + > > + for (i = 0; i < XFS_DEFER_OPS_NR_INODES && dfc->dfc_inodes[i]; i++) { > > + xfs_irele(dfc->dfc_inodes[i]); > > + dfc->dfc_inodes[i] = NULL; > > + } > > +} > > None of this belongs in xfs_icache.c. The function namespace tells > me where it should be... Agreed. Originally this couldn't really be in libxfs because xfs_iget has a different method signature in userspace, but now that we're just storing the inode pointers directly, there's no need to split this anymore. > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > > index 0d899ab7df2e..1463c3097240 100644 > > --- a/fs/xfs/xfs_log_recover.c > > +++ b/fs/xfs/xfs_log_recover.c > > @@ -1755,23 +1755,43 @@ xlog_recover_release_intent( > > spin_unlock(&ailp->ail_lock); > > } > > > > +static inline void > > +xlog_recover_irele( > > + struct xfs_inode *ip) > > +{ > > + xfs_iunlock(ip, XFS_ILOCK_EXCL); > > + xfs_irele(ip); > > +} > > Just open code it, please. > > > int > > -xlog_recover_trans_commit( > > +xlog_recover_trans_commit_inodes( > > struct xfs_trans *tp, > > - struct list_head *capture_list) > > + struct list_head *capture_list, > > + struct xfs_inode *ip1, > > + struct xfs_inode *ip2) > > So are these inodes supposed to be locked, referenced and/or ??? ILOCK'd and referenced. > > { > > struct xfs_mount *mp = tp->t_mountp; > > - struct xfs_defer_capture *dfc = xfs_defer_capture(tp); > > + struct xfs_defer_capture *dfc = xfs_defer_capture(tp, ip1, ip2); > > int error; > > That's the second time putting this logic up in the declaration list > has made me wonder where something in this function is initilaised. > Please move it into the code so that it is obvious. > > > > > /* If we don't capture anything, commit tp and exit. */ > > - if (!dfc) > > - return xfs_trans_commit(tp); > > + if (!dfc) { > > i.e. before this line. > > dfc = xfs_defer_capture(tp, ip1, ip2); > if (!dfc) { Ok. > > > + error = xfs_trans_commit(tp); > > + > > + /* We still own the inodes, so unlock and release them. */ > > + if (ip2 && ip2 != ip1) > > + xlog_recover_irele(ip2); > > + if (ip1) > > + xlog_recover_irele(ip1); > > + return error; > > + } > > Not a fan of the unnecessary complexity of this. Yeah, I got ahead of myself -- for atomic extent swapping we'll need to be able to capture two inodes, so I went straight for the end goal. I'll rip it out to simplify things for now, but this all will come back in some form... --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx