Re: [PATCH V3] xfs: Avoid races with cnt_btree lastrec updates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 25, 2024 at 09:46:51AM +0800, Zizhi Wo wrote:
> A concurrent file creation and little writing could unexpectedly return
> -ENOSPC error since there is a race window that the allocator could get
> the wrong agf->agf_longest.
> 
> Write file process steps:
> 1) Find the entry that best meets the conditions, then calculate the start
>    address and length of the remaining part of the entry after allocation.
> 2) Delete this entry and update the -current- agf->agf_longest.
> 3) Insert the remaining unused parts of this entry based on the
>    calculations in 1), and update the agf->agf_longest again if necessary.
> 
> Create file process steps:
> 1) Check whether there are free inodes in the inode chunk.
> 2) If there is no free inode, check whether there has space for creating
>    inode chunks, perform the no-lock judgment first.
> 3) If the judgment succeeds, the judgment is performed again with agf lock
>    held. Otherwire, an error is returned directly.
> 
> If the write process is in step 2) but not go to 3) yet, the create file
> process goes to 2) at this time, it may be mistaken for no space,
> resulting in the file system still has space but the file creation fails.
> 
> We have sent two different commits to the community in order to fix this
> problem[1][2]. Unfortunately, both solutions have flaws. In [2], I
> discussed with Dave and Darrick, realized that a better solution to this
> problem requires the "last cnt record tracking" to be ripped out of the
> generic btree code. And surprisingly, Dave directly provided his fix code.
> This patch includes appropriate modifications based on his tmp-code to
> address this issue.
> 
> The entire fix can be roughly divided into two parts:
> 1) Delete the code related to lastrec-update in the generic btree code.
> 2) Place the process of updating longest freespace with cntbt separately
>    to the end of the cntbt modifications. Move the cursor to the rightmost
>    firstly, and update the longest free extent based on the record.
> 
> Note that we can not update the longest with xfs_alloc_get_rec() after
> find the longest record, as xfs_verify_agbno() may not pass because
> pag->block_count is updated on the outside. Therefore, use
> xfs_btree_get_rec() as a replacement.
> 
> [1] https://lore.kernel.org/all/20240419061848.1032366-2-yebin10@xxxxxxxxxx
> [2] https://lore.kernel.org/all/20240604071121.3981686-1-wozizhi@xxxxxxxxxx
> 
> Reported by: Ye Bin <yebin10@xxxxxxxxxx>
> Signed-off-by: Zizhi Wo <wozizhi@xxxxxxxxxx>
> ---
>  fs/xfs/libxfs/xfs_alloc.c       | 115 ++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_alloc_btree.c |  64 ------------------
>  fs/xfs/libxfs/xfs_btree.c       |  51 --------------
>  fs/xfs/libxfs/xfs_btree.h       |  16 +----
>  4 files changed, 116 insertions(+), 130 deletions(-)

Mostly looks good. One small thing to fix, though.

> +/*
> + * Find the rightmost record of the cntbt, and return the longest free space
> + * recorded in it. Simply set both the block number and the length to their
> + * maximum values before searching.
> + */
> +static int
> +xfs_cntbt_longest(
> +	struct xfs_btree_cur	*cnt_cur,
> +	xfs_extlen_t		*longest)
> +{
> +	struct xfs_alloc_rec_incore irec;
> +	union xfs_btree_rec	    *rec;
> +	int			    stat = 0;
> +	int			    error;
> +
> +	memset(&cnt_cur->bc_rec, 0xFF, sizeof(cnt_cur->bc_rec));
> +	error = xfs_btree_lookup(cnt_cur, XFS_LOOKUP_LE, &stat);
> +	if (error)
> +		return error;
> +	if (!stat) {
> +		/* totally empty tree */
> +		*longest = 0;
> +		return 0;
> +	}
> +
> +	error = xfs_btree_get_rec(cnt_cur, &rec, &stat);
> +	if (error)
> +		return error;
> +	if (!stat) {
> +		ASSERT(0);
> +		*longest = 0;
> +		return 0;

If we don't find a record, some kind of btree corruption has been
encountered. Rather than "ASSERT(0)" here, this should fail in
production systems in a way that admins and online repair will
notice:

	if (XFS_IS_CORRUPT(mp, stat != 0)) {
		xfs_btree_mark_sick(cnt_cur);
		return -EFSCORRUPTED;
	}

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux