on 9/4/2023 11:00 AM, Kemeng Shi wrote: > > > on 9/1/2023 5:34 PM, Ritesh Harjani wrote: >> Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> writes: >> >>> This patch separates block bitmap and buddy bitmap freeing in order to >>> udpate block bitmap with ext4_mb_mark_context in following patch. >> ^^^ update. >> >>> >>> Separated freeing is safe with concurrent allocation as long as: >>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap. >>> 2. Firstly free block in block bitmap, and then buddy bitmap. >>> Then freed block will only be available to allocation when both buddy >>> bitmap and block bitmap are updated by freeing. >>> Allocation obeys rule 1 already, just do sperated freeing with rule 2. >> >> So we also don't need ext4_load_buddy_gfp() before freeing on-disk >> bitmap right. Continue below... >> >>> >>> Separated freeing has no race with generate_buddy as: >>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date >>> buddy page can be found in sbi->s_buddy_cache and no more buddy >>> initialization of the buddy page will be executed concurrently until >>> buddy page is unloaded. As we always do free in "load buddy, free, >>> unload buddy" sequence, separated freeing has no race with generate_buddy. >>> >>> Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> >>> --- >>> fs/ext4/mballoc.c | 18 ++++++++---------- >>> 1 file changed, 8 insertions(+), 10 deletions(-) >>> >>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >>> index e650eac22237..01ad36a1cc96 100644 >>> --- a/fs/ext4/mballoc.c >>> +++ b/fs/ext4/mballoc.c >>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >>> if (err) >>> goto error_return; >> >> >> Let me add the a piece of code before the added changes and continue... >> >> err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, >> GFP_NOFS|__GFP_NOFAIL); >> if (err) >> goto error_return; >>> >>> + ext4_lock_group(sb, block_group); >>> + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >>> + ret = ext4_free_group_clusters(sb, gdp) + count_clusters; >>> + ext4_free_group_clusters_set(sb, gdp, ret); >>> + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); >>> + ext4_group_desc_csum_set(sb, block_group, gdp); >>> + ext4_unlock_group(sb, block_group); >>> + >> >> ...Is it required for ext4_mb_load_buddy_gfp() to be called before >> freeing on-disk bitmap blocks? Can you explain why please? >> At least it is not very clear to me that why do we need >> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not >> needed then I think we should separate out bitmap freeing logic and >> buddy freeing logic even further. > Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put > it before bit clearing for catching error and aborting mark early > to reduce functional change. Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer to call ext4_mb_load_buddy_gfp early to catch potential error in advance. Just wonder is this good to you. Like to hear any advice. Thanks! >> >> Thoughts? >> >>> /* >>> * We need to make sure we don't reuse the freed block until after the >>> * transaction is committed. We make an exception if the inode is to be >>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >>> new_entry->efd_tid = handle->h_transaction->t_tid; >>> >>> ext4_lock_group(sb, block_group); >>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >>> ext4_mb_free_metadata(handle, &e4b, new_entry); >>> } else { >>> - /* need to update group_info->bb_free and bitmap >>> - * with group lock held. generate_buddy look at >>> - * them with group lock_held >>> - */ >>> if (test_opt(sb, DISCARD)) { >>> err = ext4_issue_discard(sb, block_group, bit, >>> count_clusters, NULL); >>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >>> EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info); >>> >>> ext4_lock_group(sb, block_group); >>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >>> mb_free_blocks(inode, &e4b, bit, count_clusters); >>> } >>> >>> - ret = ext4_free_group_clusters(sb, gdp) + count_clusters; >>> - ext4_free_group_clusters_set(sb, gdp, ret); >>> - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); >>> - ext4_group_desc_csum_set(sb, block_group, gdp); >>> ext4_unlock_group(sb, block_group); >>> >>> if (sbi->s_log_groups_per_flex) { >> >> >> Adding piece of code here... >> >> if (sbi->s_log_groups_per_flex) { >> ext4_group_t flex_group = ext4_flex_group(sbi, block_group); >> atomic64_add(count_clusters, >> &sbi_array_rcu_deref(sbi, s_flex_groups, >> flex_group)->free_clusters); >> } >> >> <...> >> >> /* We dirtied the bitmap block */ >> BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); >> err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); >> /* And the group descriptor block */ >> BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); >> ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); >> if (!err) >> err = ret; >> >> I was thinking even this can go around bitmap freeing logic above. So >> the next patch becomes very clear that all of the bitmap freeing logic >> is just simply moved into ext4_mb_mark_context() function. >> >> -ritesh >> > >