Re: FAILED: patch "[PATCH] ext4: fix rbtree traversal bug in ext4_mb_use_preallocated" failed to apply to 6.4-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 24, 2023 at 08:19:13AM +0200, gregkh@xxxxxxxxxxxxxxxxxxx wrote:
> 
> The patch below does not apply to the 6.4-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@xxxxxxxxxxxxxxx>.
> 
> To reproduce the conflict and resubmit, you may use the following commands:
> 
> git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
> git checkout FETCH_HEAD
> git cherry-pick -x 9d3de7ee192a6a253f475197fe4d2e2af10a731f
> # <resolve conflicts, build, test, etc.>
> git commit -s
> git send-email --to '<stable@xxxxxxxxxxxxxxx>' --in-reply-to '2023072413-glamorous-unjustly-bb12@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
> 
> Possible dependencies:
> 
> 
> 
> thanks,
> 
> greg k-h

Hi Greg,

So seems like this patch is already in the linux-6.4.y branch and seems 
to have been applied before i got this email:

  339fee69a1da ext4: fix rbtree traversal bug in ext4_mb_use_preallocated

Any idea why do we still see this failure?

regards,
ojaswin

> 
> ------------------ original commit in Linus's tree ------------------
> 
> From 9d3de7ee192a6a253f475197fe4d2e2af10a731f Mon Sep 17 00:00:00 2001
> From: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
> Date: Sat, 22 Jul 2023 22:45:24 +0530
> Subject: [PATCH] ext4: fix rbtree traversal bug in ext4_mb_use_preallocated
> 
> During allocations, while looking for preallocations(PA) in the per
> inode rbtree, we can't do a direct traversal of the tree because
> ext4_mb_discard_group_preallocation() can paralelly mark the pa deleted
> and that can cause direct traversal to skip some entries. This was
> leading to a BUG_ON() being hit [1] when we missed a PA that could satisfy
> our request and ultimately tried to create a new PA that would overlap
> with the missed one.
> 
> To makes sure we handle that case while still keeping the performance of
> the rbtree, we make use of the fact that the only pa that could possibly
> overlap the original goal start is the one that satisfies the below
> conditions:
> 
>   1. It must have it's logical start immediately to the left of
>   (ie less than) original logical start.
> 
>   2. It must not be deleted
> 
> To find this pa we use the following traversal method:
> 
> 1. Descend into the rbtree normally to find the immediate neighboring
> PA. Here we keep descending irrespective of if the PA is deleted or if
> it overlaps with our request etc. The goal is to find an immediately
> adjacent PA.
> 
> 2. If the found PA is on right of original goal, use rb_prev() to find
> the left adjacent PA.
> 
> 3. Check if this PA is deleted and keep moving left with rb_prev() until
> a non deleted PA is found.
> 
> 4. This is the PA we are looking for. Now we can check if it can satisfy
> the original request and proceed accordingly.
> 
> This approach also takes care of having deleted PAs in the tree.
> 
> (While we are at it, also fix a possible overflow bug in calculating the
> end of a PA)
> 
> [1] https://lore.kernel.org/linux-ext4/CA+G9fYv2FRpLqBZf34ZinR8bU2_ZRAUOjKAD3+tKRFaEQHtt8Q@xxxxxxxxxxxxxx/
> 
> Cc: stable@xxxxxxxxxx # 6.4
> Fixes: 3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
> Signed-off-by: Ojaswin Mujoo <ojaswin@xxxxxxxxxxxxx>
> Reported-by: Naresh Kamboju <naresh.kamboju@xxxxxxxxxx>
> Reviewed-by: Ritesh Harjani (IBM) ritesh.list@xxxxxxxxx
> Tested-by: Ritesh Harjani (IBM) ritesh.list@xxxxxxxxx
> Link: https://lore.kernel.org/r/edd2efda6a83e6343c5ace9deea44813e71dbe20.1690045963.git.ojaswin@xxxxxxxxxxxxx
> Signed-off-by: Theodore Ts'o <tytso@xxxxxxx>
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 456150ef6111..21b903fe546e 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4765,8 +4765,8 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
>  	int order, i;
>  	struct ext4_inode_info *ei = EXT4_I(ac->ac_inode);
>  	struct ext4_locality_group *lg;
> -	struct ext4_prealloc_space *tmp_pa, *cpa = NULL;
> -	ext4_lblk_t tmp_pa_start, tmp_pa_end;
> +	struct ext4_prealloc_space *tmp_pa = NULL, *cpa = NULL;
> +	loff_t tmp_pa_end;
>  	struct rb_node *iter;
>  	ext4_fsblk_t goal_block;
>  
> @@ -4774,47 +4774,151 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
>  	if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
>  		return false;
>  
> -	/* first, try per-file preallocation */
> +	/*
> +	 * first, try per-file preallocation by searching the inode pa rbtree.
> +	 *
> +	 * Here, we can't do a direct traversal of the tree because
> +	 * ext4_mb_discard_group_preallocation() can paralelly mark the pa
> +	 * deleted and that can cause direct traversal to skip some entries.
> +	 */
>  	read_lock(&ei->i_prealloc_lock);
> +
> +	if (RB_EMPTY_ROOT(&ei->i_prealloc_node)) {
> +		goto try_group_pa;
> +	}
> +
> +	/*
> +	 * Step 1: Find a pa with logical start immediately adjacent to the
> +	 * original logical start. This could be on the left or right.
> +	 *
> +	 * (tmp_pa->pa_lstart never changes so we can skip locking for it).
> +	 */
>  	for (iter = ei->i_prealloc_node.rb_node; iter;
>  	     iter = ext4_mb_pa_rb_next_iter(ac->ac_o_ex.fe_logical,
> -					    tmp_pa_start, iter)) {
> +					    tmp_pa->pa_lstart, iter)) {
>  		tmp_pa = rb_entry(iter, struct ext4_prealloc_space,
>  				  pa_node.inode_node);
> +	}
>  
> -		/* all fields in this condition don't change,
> -		 * so we can skip locking for them */
> -		tmp_pa_start = tmp_pa->pa_lstart;
> -		tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len);
> +	/*
> +	 * Step 2: The adjacent pa might be to the right of logical start, find
> +	 * the left adjacent pa. After this step we'd have a valid tmp_pa whose
> +	 * logical start is towards the left of original request's logical start
> +	 */
> +	if (tmp_pa->pa_lstart > ac->ac_o_ex.fe_logical) {
> +		struct rb_node *tmp;
> +		tmp = rb_prev(&tmp_pa->pa_node.inode_node);
>  
> -		/* original request start doesn't lie in this PA */
> -		if (ac->ac_o_ex.fe_logical < tmp_pa_start ||
> -		    ac->ac_o_ex.fe_logical >= tmp_pa_end)
> -			continue;
> -
> -		/* non-extent files can't have physical blocks past 2^32 */
> -		if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) &&
> -		    (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) >
> -		     EXT4_MAX_BLOCK_FILE_PHYS)) {
> +		if (tmp) {
> +			tmp_pa = rb_entry(tmp, struct ext4_prealloc_space,
> +					    pa_node.inode_node);
> +		} else {
>  			/*
> -			 * Since PAs don't overlap, we won't find any
> -			 * other PA to satisfy this.
> +			 * If there is no adjacent pa to the left then finding
> +			 * an overlapping pa is not possible hence stop searching
> +			 * inode pa tree
> +			 */
> +			goto try_group_pa;
> +		}
> +	}
> +
> +	BUG_ON(!(tmp_pa && tmp_pa->pa_lstart <= ac->ac_o_ex.fe_logical));
> +
> +	/*
> +	 * Step 3: If the left adjacent pa is deleted, keep moving left to find
> +	 * the first non deleted adjacent pa. After this step we should have a
> +	 * valid tmp_pa which is guaranteed to be non deleted.
> +	 */
> +	for (iter = &tmp_pa->pa_node.inode_node;; iter = rb_prev(iter)) {
> +		if (!iter) {
> +			/*
> +			 * no non deleted left adjacent pa, so stop searching
> +			 * inode pa tree
> +			 */
> +			goto try_group_pa;
> +		}
> +		tmp_pa = rb_entry(iter, struct ext4_prealloc_space,
> +				  pa_node.inode_node);
> +		spin_lock(&tmp_pa->pa_lock);
> +		if (tmp_pa->pa_deleted == 0) {
> +			/*
> +			 * We will keep holding the pa_lock from
> +			 * this point on because we don't want group discard
> +			 * to delete this pa underneath us. Since group
> +			 * discard is anyways an ENOSPC operation it
> +			 * should be okay for it to wait a few more cycles.
>  			 */
>  			break;
> -		}
> -
> -		/* found preallocated blocks, use them */
> -		spin_lock(&tmp_pa->pa_lock);
> -		if (tmp_pa->pa_deleted == 0 && tmp_pa->pa_free &&
> -		    likely(ext4_mb_pa_goal_check(ac, tmp_pa))) {
> -			atomic_inc(&tmp_pa->pa_count);
> -			ext4_mb_use_inode_pa(ac, tmp_pa);
> +		} else {
>  			spin_unlock(&tmp_pa->pa_lock);
> -			read_unlock(&ei->i_prealloc_lock);
> -			return true;
>  		}
> -		spin_unlock(&tmp_pa->pa_lock);
>  	}
> +
> +	BUG_ON(!(tmp_pa && tmp_pa->pa_lstart <= ac->ac_o_ex.fe_logical));
> +	BUG_ON(tmp_pa->pa_deleted == 1);
> +
> +	/*
> +	 * Step 4: We now have the non deleted left adjacent pa. Only this
> +	 * pa can possibly satisfy the request hence check if it overlaps
> +	 * original logical start and stop searching if it doesn't.
> +	 */
> +	tmp_pa_end = (loff_t)tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len);
> +
> +	if (ac->ac_o_ex.fe_logical >= tmp_pa_end) {
> +		spin_unlock(&tmp_pa->pa_lock);
> +		goto try_group_pa;
> +	}
> +
> +	/* non-extent files can't have physical blocks past 2^32 */
> +	if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) &&
> +	    (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) >
> +	     EXT4_MAX_BLOCK_FILE_PHYS)) {
> +		/*
> +		 * Since PAs don't overlap, we won't find any other PA to
> +		 * satisfy this.
> +		 */
> +		spin_unlock(&tmp_pa->pa_lock);
> +		goto try_group_pa;
> +	}
> +
> +	if (tmp_pa->pa_free && likely(ext4_mb_pa_goal_check(ac, tmp_pa))) {
> +		atomic_inc(&tmp_pa->pa_count);
> +		ext4_mb_use_inode_pa(ac, tmp_pa);
> +		spin_unlock(&tmp_pa->pa_lock);
> +		read_unlock(&ei->i_prealloc_lock);
> +		return true;
> +	} else {
> +		/*
> +		 * We found a valid overlapping pa but couldn't use it because
> +		 * it had no free blocks. This should ideally never happen
> +		 * because:
> +		 *
> +		 * 1. When a new inode pa is added to rbtree it must have
> +		 *    pa_free > 0 since otherwise we won't actually need
> +		 *    preallocation.
> +		 *
> +		 * 2. An inode pa that is in the rbtree can only have it's
> +		 *    pa_free become zero when another thread calls:
> +		 *      ext4_mb_new_blocks
> +		 *       ext4_mb_use_preallocated
> +		 *        ext4_mb_use_inode_pa
> +		 *
> +		 * 3. Further, after the above calls make pa_free == 0, we will
> +		 *    immediately remove it from the rbtree in:
> +		 *      ext4_mb_new_blocks
> +		 *       ext4_mb_release_context
> +		 *        ext4_mb_put_pa
> +		 *
> +		 * 4. Since the pa_free becoming 0 and pa_free getting removed
> +		 * from tree both happen in ext4_mb_new_blocks, which is always
> +		 * called with i_data_sem held for data allocations, we can be
> +		 * sure that another process will never see a pa in rbtree with
> +		 * pa_free == 0.
> +		 */
> +		WARN_ON_ONCE(tmp_pa->pa_free == 0);
> +	}
> +	spin_unlock(&tmp_pa->pa_lock);
> +try_group_pa:
>  	read_unlock(&ei->i_prealloc_lock);
>  
>  	/* can we use group allocation? */
> 



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux