On Fri, Aug 25, 2017 at 11:05:55AM -0400, Brian Foster wrote: > The extent swap operation currently resets bmbt block owners before > the inode forks are swapped. The bmbt buffers are marked as ordered > so they do not have to be physically logged in the transaction. > > This use of ordered buffers is not safe as bmbt buffers may have > been previously physically logged. The bmbt owner change algorithm > needs to be updated to physically log buffers that are already dirty > when/if they are encountered. This means that an extent swap will > eventually require multiple rolling transactions to handle large > btrees. In addition, all inode related changes must be logged before > the bmbt owner change scan begins and can roll the transaction for > the first time to preserve fs consistency via log recovery. > > In preparation for such fixes to the bmbt owner change algorithm, > refactor the bmbt scan out of the extent fork swap code to the last > operation before the transaction is committed. Update > xfs_swap_extent_forks() to only set the inode log flags when an > owner change scan is necessary. Update xfs_swap_extents() to trigger > the owner change based on the inode log flags. Note that since the > owner change now occurs after the extent fork swap, the inode btrees > must be fixed up with the inode number of the current inode (similar > to log recovery). > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> Looks ok I think, Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > --- > fs/xfs/xfs_bmap_util.c | 44 ++++++++++++++++++++++++++------------------ > 1 file changed, 26 insertions(+), 18 deletions(-) > > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c > index 93e9552..ee8fb9a 100644 > --- a/fs/xfs/xfs_bmap_util.c > +++ b/fs/xfs/xfs_bmap_util.c > @@ -1840,29 +1840,18 @@ xfs_swap_extent_forks( > } > > /* > - * Before we've swapped the forks, lets set the owners of the forks > - * appropriately. We have to do this as we are demand paging the btree > - * buffers, and so the validation done on read will expect the owner > - * field to be correctly set. Once we change the owners, we can swap the > - * inode forks. > + * Btree format (v3) inodes have the inode number stamped in the bmbt > + * block headers. We can't start changing the bmbt blocks until the > + * inode owner change is logged so recovery does the right thing in the > + * event of a crash. Set the owner change log flags now and leave the > + * bmbt scan as the last step. > */ > if (ip->i_d.di_version == 3 && > - ip->i_d.di_format == XFS_DINODE_FMT_BTREE) { > + ip->i_d.di_format == XFS_DINODE_FMT_BTREE) > (*target_log_flags) |= XFS_ILOG_DOWNER; > - error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, > - tip->i_ino, NULL); > - if (error) > - return error; > - } > - > if (tip->i_d.di_version == 3 && > - tip->i_d.di_format == XFS_DINODE_FMT_BTREE) { > + tip->i_d.di_format == XFS_DINODE_FMT_BTREE) > (*src_log_flags) |= XFS_ILOG_DOWNER; > - error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK, > - ip->i_ino, NULL); > - if (error) > - return error; > - } > > /* > * Swap the data forks of the inodes > @@ -2092,6 +2081,25 @@ xfs_swap_extents( > xfs_trans_log_inode(tp, tip, target_log_flags); > > /* > + * The extent forks have been swapped, but crc=1,rmapbt=0 filesystems > + * have inode number owner values in the bmbt blocks that still refer to > + * the old inode. Scan each bmbt to fix up the owner values with the > + * inode number of the current inode. > + */ > + if (src_log_flags & XFS_ILOG_DOWNER) { > + error = xfs_bmbt_change_owner(tp, ip, XFS_DATA_FORK, > + ip->i_ino, NULL); > + if (error) > + goto out_trans_cancel; > + } > + if (target_log_flags & XFS_ILOG_DOWNER) { > + error = xfs_bmbt_change_owner(tp, tip, XFS_DATA_FORK, > + tip->i_ino, NULL); > + if (error) > + goto out_trans_cancel; > + } > + > + /* > * If this is a synchronous mount, make sure that the > * transaction goes to disk before returning to the user. > */ > -- > 2.9.5 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html