on 9/1/2023 5:34 PM, Ritesh Harjani wrote: > Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> writes: > >> This patch separates block bitmap and buddy bitmap freeing in order to >> udpate block bitmap with ext4_mb_mark_context in following patch. > ^^^ update. > >> >> Separated freeing is safe with concurrent allocation as long as: >> 1. Firstly allocate block in buddy bitmap, and then in block bitmap. >> 2. Firstly free block in block bitmap, and then buddy bitmap. >> Then freed block will only be available to allocation when both buddy >> bitmap and block bitmap are updated by freeing. >> Allocation obeys rule 1 already, just do sperated freeing with rule 2. > > So we also don't need ext4_load_buddy_gfp() before freeing on-disk > bitmap right. Continue below... > >> >> Separated freeing has no race with generate_buddy as: >> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date >> buddy page can be found in sbi->s_buddy_cache and no more buddy >> initialization of the buddy page will be executed concurrently until >> buddy page is unloaded. As we always do free in "load buddy, free, >> unload buddy" sequence, separated freeing has no race with generate_buddy. >> >> Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> >> --- >> fs/ext4/mballoc.c | 18 ++++++++---------- >> 1 file changed, 8 insertions(+), 10 deletions(-) >> >> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >> index e650eac22237..01ad36a1cc96 100644 >> --- a/fs/ext4/mballoc.c >> +++ b/fs/ext4/mballoc.c >> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >> if (err) >> goto error_return; > > > Let me add the a piece of code before the added changes and continue... > > err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, > GFP_NOFS|__GFP_NOFAIL); > if (err) > goto error_return; >> >> + ext4_lock_group(sb, block_group); >> + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >> + ret = ext4_free_group_clusters(sb, gdp) + count_clusters; >> + ext4_free_group_clusters_set(sb, gdp, ret); >> + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); >> + ext4_group_desc_csum_set(sb, block_group, gdp); >> + ext4_unlock_group(sb, block_group); >> + > > ...Is it required for ext4_mb_load_buddy_gfp() to be called before > freeing on-disk bitmap blocks? Can you explain why please? > At least it is not very clear to me that why do we need > ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not > needed then I think we should separate out bitmap freeing logic and > buddy freeing logic even further. Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put it before bit clearing for catching error and aborting mark early to reduce functional change. > > Thoughts? > >> /* >> * We need to make sure we don't reuse the freed block until after the >> * transaction is committed. We make an exception if the inode is to be >> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >> new_entry->efd_tid = handle->h_transaction->t_tid; >> >> ext4_lock_group(sb, block_group); >> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >> ext4_mb_free_metadata(handle, &e4b, new_entry); >> } else { >> - /* need to update group_info->bb_free and bitmap >> - * with group lock held. generate_buddy look at >> - * them with group lock_held >> - */ >> if (test_opt(sb, DISCARD)) { >> err = ext4_issue_discard(sb, block_group, bit, >> count_clusters, NULL); >> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >> EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info); >> >> ext4_lock_group(sb, block_group); >> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >> mb_free_blocks(inode, &e4b, bit, count_clusters); >> } >> >> - ret = ext4_free_group_clusters(sb, gdp) + count_clusters; >> - ext4_free_group_clusters_set(sb, gdp, ret); >> - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); >> - ext4_group_desc_csum_set(sb, block_group, gdp); >> ext4_unlock_group(sb, block_group); >> >> if (sbi->s_log_groups_per_flex) { > > > Adding piece of code here... > > if (sbi->s_log_groups_per_flex) { > ext4_group_t flex_group = ext4_flex_group(sbi, block_group); > atomic64_add(count_clusters, > &sbi_array_rcu_deref(sbi, s_flex_groups, > flex_group)->free_clusters); > } > > <...> > > /* We dirtied the bitmap block */ > BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); > err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); > /* And the group descriptor block */ > BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); > ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); > if (!err) > err = ret; > > I was thinking even this can go around bitmap freeing logic above. So > the next patch becomes very clear that all of the bitmap freeing logic > is just simply moved into ext4_mb_mark_context() function. > > -ritesh >