Re: [PATCH 2/2] Add support to RENAME_EXCHANGE flag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the review guys, I'm going to apply the suggestions and send a V2

On Mon, Oct 20, 2014 at 11:25:28AM +1100, Dave Chinner wrote:
> On Thu, Oct 16, 2014 at 05:05:37PM -0400, Brian Foster wrote:
> > On Wed, Oct 15, 2014 at 03:17:22PM -0300, Carlos Maiolino wrote:
> > > Adds a new function named xfs_cross_rename(), responsible to handle requests
> > > from sys_renameat2() using RENAME_EXCHANGE flag.
> > > 
> > > Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
> > > ---
> > 
> > Hi Carlos,
> > 
> > Some high-level comments from a first pass...
> > 
> > >  fs/xfs/xfs_inode.c | 187 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_inode.h |   4 ++
> > >  fs/xfs/xfs_iops.c  |   7 +-
> > >  3 files changed, 197 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index fea3c92..a5bc88d 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -2920,6 +2920,193 @@ xfs_rename(
> > >  	return error;
> > >  }
> > >  
> > > +/* xfs_cross_rename()
> > > + *
> > > + * responsible to handle RENAME_EXCHANGE flag
> > > + * in renameat2() sytemcall
> > > + */
> > > +int
> > > +xfs_cross_rename(
> > > +	xfs_inode_t	*src_dp,
> > > +	struct xfs_name	*src_name,
> > > +	xfs_inode_t	*src_ip,
> > > +	xfs_inode_t	*target_dp,
> > > +	struct xfs_name	*target_name,
> > > +	xfs_inode_t	*target_ip)
> > > +{
> > > +	xfs_trans_t	*tp = NULL;
> > > +	xfs_mount_t	*mp = src_dp->i_mount;
> > > +	int		new_parent;		/* Crossing from different parents */
> > > +	int		src_is_directory;
> > > +	int		tgt_is_directory;
> > > +	int		error;
> > > +	xfs_bmap_free_t	free_list;
> > > +	xfs_fsblock_t	first_block;
> > > +	int		cancel_flags;
> > > +	int		committed;
> > > +	xfs_inode_t	*inodes[4];
> > > +	int		spaceres;
> > > +	int		num_inodes;
> > > +
> > > +	new_parent = (src_dp != target_dp);
> > > +	src_is_directory = S_ISDIR(src_ip->i_d.di_mode);
> > > +	tgt_is_directory = S_ISDIR(target_ip->i_d.di_mode);
> > > +
> > > +	xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip,
> > > +				inodes, &num_inodes);
> > > +
> > > +	xfs_bmap_init(&free_list, &first_block);
> > > +	tp = xfs_trans_alloc(mp, XFS_TRANS_RENAME);
> > > +	cancel_flags = XFS_TRANS_RELEASE_LOG_RES;
> > > +	spaceres = XFS_RENAME_SPACE_RES(mp, target_name->len);
> > > +	error = xfs_trans_reserve(tp, &M_RES(mp)->tr_rename, spaceres, 0);
> > > +
> > 
> > It seems to me that the existing block and log reservations would
> > "cover" the rename exchange case, but it might be worth defining new
> > reservations for the purpose of clarity and to prevent future problems.
> > 
> > XFS_RENAME_SPACE_RES() covers directory removal and insertion. Here we
> > are doing neither, which makes me wonder whether we need a block
> > reservation at all. It does appear that we have a sf dir case where the
> > inode number could cause a format conversion. Perhaps we need something
> > that calculates the blocks required for the insertion of the max of both
> > names (it seems like the conversion would only happen once, but we don't
> > know which way)? I haven't spent a ton of time in directory code, so I
> > could easily be missing something.
> 
> The shortform replace can result in shortform->block conversion,
> therefore we need the reservation.
> 
> > The tr_rename log reservation considers four inodes, two directory
> > modifications, a target inode unlink (the overwrite case), and alloc
> > btree mods for directory blocks being freed. IIUC, the exchange case
> > should only ever log four inodes and the possible dir format conversion
> > (e.g., no unlink, no dir block frees). We could define a new
> > tr_rename_xchg reservation that encodes that and documents it
> > appropriately in the comment.
> 
> The rename log reservation is the worse case that a rename operation
> requires - it is not specific to a particular rename instance. This
> new reanme type fits within the existing definition, so we should
> just use it unchanged.
> 
> What it comes down to is that there is no point in trying to define
> reservations for every single possible type of operation we can do -
> it's just too much maintenance overhead to verify that they are
> correct after some incidental change. If we define the worst case,
> then everything else is covered and we don't have to care about
> whether we have the reservation for a specific case right, or indeed
> whether we are using the correct reservation for a specific rename
> transaction....
> 
> 
> > > +	if (error == -ENOSPC) {
> > > +		spaceres = 0;
> > > +		error = xfs_trans_reserve(tp, &M_RES(mp)->tr_rename, 0, 0);
> > > +	}
> > > +	if (error) {
> > > +		xfs_trans_cancel(tp, 0);
> > > +		goto std_return;
> > > +	}
> 
> This is not necessary. The spaceres == 0 case in the rename is for
> adding new directory entries at ENOSPC and that is checked by
> xfs_dir_canenter(). We are not calling that function (because we
> aren't adding a name) and therefore we can't run without a full
> space reservation.
> 
> Oh, and kill that "std_return" name.
> 
> 	if (error) {
> 		cancel_flags = 0;
> 		goto out_trans_cancel;
> 	}
> 
> > > +
> > > +	/*
> > > +	 * Attach the dquots to the inodes
> > > +	 */
> > > +	error = xfs_qm_vop_rename_dqattach(inodes);
> > > +	if (error) {
> > > +		xfs_trans_cancel(tp, cancel_flags);
> > > +		goto std_return;
> > > +	}
> 
> 	if (error)
> 		goto out_trans_cancel;
> 
> > > +
> > > +	/*
> > > +	 * Lock all participating inodes. In case of RENAME_EXCHANGE, target
> > > +	 * must exist, so we'll be locking at least 3 inodes here.
> > > +	 */
> > > +	xfs_lock_inodes(inodes, num_inodes, XFS_ILOCK_EXCL);
> > > +
> > > +	/*
> > > +	 * Join all the inodes to the transaction. From this point on,
> > > +	 * we can rely on either trans_commit or trans_cancel to unlock
> > > +	 * them.
> > > +	 * target_ip will always exist, so, no need to check its existence.
> > > +	 */
> > > +	xfs_trans_ijoin(tp, src_dp, XFS_ILOCK_EXCL);
> > > +	if (new_parent)
> > > +		xfs_trans_ijoin(tp, target_dp, XFS_ILOCK_EXCL);
> > > +
> > > +	xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
> > > +	xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
> > > +
> > > +	/*
> > > +	* If we are using project inheritance, we only allow RENAME_EXCHANGE
> > > +	* into our tree when the project IDs are the same; else the tree quota
> > > +	* mechanism would be circumvented.
> > > +	*/
> > > +	if (unlikely(((target_dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT) ||
> > > +		     (src_dp->i_d.di_flags & XFS_DIFLAG_PROJINHERIT)) &&
> > > +		     (xfs_get_projid(src_dp) != xfs_get_projid(target_dp)) )) {
> > > +		error = -EXDEV;
> > > +		goto error_return;
> > > +	}
> > > +
> > 
> > I think that having a separate helper for the rename exchange case is
> > generally the right thing. That said, I wonder if we're splitting things
> > at the right level because it looks like xfs_rename() could handle
> > everything we have in xfs_cross_rename() up to about this point.
> 
> Right. I think that splitting out the internal part of xfs_rename
> after all this common setup code is the best way to proceed.
> 
> > I definitely don't think we should go too far and try to handle all of
> > this in one function, even if there is some duplication in the directory
> > name replacement and inode link management. The logic would probably end
> > up unnecessarily hairy and difficult to reason about.
> > 
> > > +	error = xfs_dir_replace(tp, src_dp, src_name,
> > > +				target_ip->i_ino,
> > > +				&first_block, &free_list, spaceres);
> > > +	if (error)
> > > +		goto abort_return;
> > > +
> > > +	/*
> > > +	 * Update ".." entry to match the new parent
> > > +	 */
> > > +	if (new_parent && tgt_is_directory) {
> > > +		error = xfs_dir_replace(tp, target_ip, &xfs_name_dotdot,
> > > +					src_dp->i_ino, &first_block, &free_list, spaceres);
> > > +		if (error)
> > > +			goto abort_return;
> > > +	}
> > > +
> > > +	xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
> > > +
> > > +	error = xfs_dir_replace(tp, target_dp, target_name,
> > > +				src_ip->i_ino,
> > > +				&first_block, &free_list, spaceres);
> > > +	if (error)
> > > +		goto abort_return;
> > > +
> > > +	/*
> > > +	 * Update ".." entry to match the new parent
> > > +	 */
> > > +	if (new_parent && src_is_directory) {
> > > +		error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
> > > +					target_dp->i_ino, &first_block, &free_list, spaceres);
> > > +		if (error)
> > > +			goto abort_return;
> > > +	}
> 
> So you do a bunch of work based on new_parent and
> tgt/src_is_directory, and then:
> 
> > > +
> > > +	/*
> > > +	 * In case we are crossing different file types between different
> > > +	 * parents, we must update parent's link count to match the ".."
> > > +	 * entry of the new child (or the removal of it).
> > > +	 */
> > > +	if (new_parent) {
> > > +		xfs_trans_ichgtime(tp, target_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
> > > +
> > > +		if (src_is_directory && !tgt_is_directory) {
> > > +			error = xfs_droplink(tp, src_dp);
> > > +				if (error)
> > > +					goto abort_return;
> 
> [whitespace is screwed up]
> 
> > > +			error = xfs_bumplink(tp, target_dp);
> > > +				if (error)
> > > +					goto abort_return;
> > > +		}
> > > +
> > > +		if (tgt_is_directory && !src_is_directory) {
> > > +			error = xfs_droplink(tp, target_dp);
> > > +				if (error)
> > > +					goto abort_return;
> > > +			error = xfs_bumplink(tp, src_dp);
> > > +				if (error)
> > > +					goto abort_return;
> > > +		}
> > > +
> > > +		/*
> > > +		 * We don't need to log the source dir if
> > > +		 * this is the same as the target.
> > > +		 */
> > > +		xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
> > > +	}
> 
> You do a bunch more work based on the same variables. THis should
> reall ybe combined into a single set of logic to manipulate the
> directory states.
> 
> 	if (new_parent) {
> 		if (tgt_is_directory) {
> 			error = xfs_dir_replace(tp, target_ip, &xfs_name_dotdot,
> 						src_dp->i_ino, &first_block, &free_list, spaceres);
> 			if (error)
> 				goto out_abort;
> 			if (!src_is_directory) {
> 				error = xfs_droplink(tp, target_dp);
> 				if (error)
> 					goto out_abort;
> 				error = xfs_bumplink(tp, src_dp);
> 				if (error)
> 					goto out_abort;
> 			}
> 		}
> 
> 		if (src_is_directory) {
> 			error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
> 						target_dp->i_ino, &first_block, &free_list, spaceres);
> 			if (error)
> 				goto out_abort;
> 		.....
> 
> 
> > 
> > > +	/*
> > > +	 * If this is a synchronous mount, make sure the rename transaction goes
> > > +	 * to disk before returning to the user.
> > > +	 */
> > > +	if (mp->m_flags & (XFS_MOUNT_WSYNC|XFS_MOUNT_DIRSYNC))
> > > +		xfs_trans_set_sync(tp);
> > > +
> > > +	error = xfs_bmap_finish(&tp, &free_list, &committed);
> > > +	if (error) {
> > > +		xfs_bmap_cancel(&free_list);
> > > +		xfs_trans_cancel(tp, (XFS_TRANS_RELEASE_LOG_RES | XFS_TRANS_ABORT));
> > > +		goto std_return;
> > > +	}
> > > +	return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
> 
> 	error = xfs_bmap_finish(&tp, &free_list, &committed);
> 	if (!error)
> 		return xfs_trans_commit(tp, XFS_TRANS_RELEASE_LOG_RES);
> 
> > > +abort_return:
> > > +	cancel_flags |= XFS_TRANS_ABORT;
> > > +error_return:
> > > +	xfs_bmap_cancel(&free_list);
> > > +	xfs_trans_cancel(tp, cancel_flags);
> > > +std_return:
> > > +	return error;
> 
> out_abort:
> 	cancel_flags |= XFS_TRANS_ABORT;
> out_bmap_cancel:
> 	xfs_bmap_cancel(&free_list);
> out_trans_cancel:
> 	xfs_trans_cancel(tp, cancel_flags);
> 	return error;
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

-- 
Carlos

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs




[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux