on 9/12/2023 6:13 PM, Ritesh Harjani wrote: > Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> writes: > >> on 9/4/2023 11:00 AM, Kemeng Shi wrote: >>> >>> >>> on 9/1/2023 5:34 PM, Ritesh Harjani wrote: >>>> Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> writes: >>>> >>>>> This patch separates block bitmap and buddy bitmap freeing in order to >>>>> udpate block bitmap with ext4_mb_mark_context in following patch. >>>> ^^^ update. >>>> >>>>> >>>>> Separated freeing is safe with concurrent allocation as long as: >>>>> 1. Firstly allocate block in buddy bitmap, and then in block bitmap. >>>>> 2. Firstly free block in block bitmap, and then buddy bitmap. >>>>> Then freed block will only be available to allocation when both buddy >>>>> bitmap and block bitmap are updated by freeing. >>>>> Allocation obeys rule 1 already, just do sperated freeing with rule 2. >>>> >>>> So we also don't need ext4_load_buddy_gfp() before freeing on-disk >>>> bitmap right. Continue below... >>>> >>>>> >>>>> Separated freeing has no race with generate_buddy as: >>>>> Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date >>>>> buddy page can be found in sbi->s_buddy_cache and no more buddy >>>>> initialization of the buddy page will be executed concurrently until >>>>> buddy page is unloaded. As we always do free in "load buddy, free, >>>>> unload buddy" sequence, separated freeing has no race with generate_buddy. >>>>> >>>>> Signed-off-by: Kemeng Shi <shikemeng@xxxxxxxxxxxxxxx> >>>>> --- >>>>> fs/ext4/mballoc.c | 18 ++++++++---------- >>>>> 1 file changed, 8 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c >>>>> index e650eac22237..01ad36a1cc96 100644 >>>>> --- a/fs/ext4/mballoc.c >>>>> +++ b/fs/ext4/mballoc.c >>>>> @@ -6519,6 +6519,14 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >>>>> if (err) >>>>> goto error_return; >>>> >>>> >>>> Let me add the a piece of code before the added changes and continue... >>>> >>>> err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, >>>> GFP_NOFS|__GFP_NOFAIL); >>>> if (err) >>>> goto error_return; >>>>> >>>>> + ext4_lock_group(sb, block_group); >>>>> + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >>>>> + ret = ext4_free_group_clusters(sb, gdp) + count_clusters; >>>>> + ext4_free_group_clusters_set(sb, gdp, ret); >>>>> + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); >>>>> + ext4_group_desc_csum_set(sb, block_group, gdp); >>>>> + ext4_unlock_group(sb, block_group); >>>>> + >>>> >>>> ...Is it required for ext4_mb_load_buddy_gfp() to be called before >>>> freeing on-disk bitmap blocks? Can you explain why please? >>>> At least it is not very clear to me that why do we need >>>> ext4_mb_load_buddy_gfp() before clearing of bitmap blocks. If it's not >>>> needed then I think we should separate out bitmap freeing logic and >>>> buddy freeing logic even further. >>> Yes, ext4_mb_load_buddy_gfp is no needed for clearing of bitmap, put > > Earlier you mentioned here that there is no dependency. > > >>> it before bit clearing for catching error and aborting mark early >>> to reduce functional change. >> Hi Ritesh, sorry for bother. I'm going to push a new version and I perfer >> to call ext4_mb_load_buddy_gfp early to catch potential error in advance. >> Just wonder is this good to you. Like to hear any advice. Thanks! > > Isn't there actually a dependency that we should first always call > ext4_mb_load_buddy_gfp() before clearing the bitmap? > > Because say if we take care of the bitmap handling first, i.e. we clear > the bitmap and also call ext4_handle_dirty_metadata() first. Now assume > we later call ext4_mb_load_buddy_gfp(). Now also assume the buddy pages > were not already loaded in memory (reclaimed due to memory pressure), > in that case we will read the block bitmap of that group from the > on-disk bitmap copy, which we already changed above. > So that means the buddy handling logic will already find the block > bitmap as cleared and should report problems right? > > Then isn't there an actual dependency here, meaning > ext4_mb_load_buddy_gfp() should always be called first before bitmap > handling logic. Thoughts? > My fault, ext4_mb_load_buddy_gfp should be called before on-disk bitmap handling if buddy page was not loaded... > -ritesh > > >>>> >>>> Thoughts? >>>> >>>>> /* >>>>> * We need to make sure we don't reuse the freed block until after the >>>>> * transaction is committed. We make an exception if the inode is to be >>>>> @@ -6541,13 +6549,8 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >>>>> new_entry->efd_tid = handle->h_transaction->t_tid; >>>>> >>>>> ext4_lock_group(sb, block_group); >>>>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >>>>> ext4_mb_free_metadata(handle, &e4b, new_entry); >>>>> } else { >>>>> - /* need to update group_info->bb_free and bitmap >>>>> - * with group lock held. generate_buddy look at >>>>> - * them with group lock_held >>>>> - */ >>>>> if (test_opt(sb, DISCARD)) { >>>>> err = ext4_issue_discard(sb, block_group, bit, >>>>> count_clusters, NULL); >>>>> @@ -6560,14 +6563,9 @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, >>>>> EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info); >>>>> >>>>> ext4_lock_group(sb, block_group); >>>>> - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); >>>>> mb_free_blocks(inode, &e4b, bit, count_clusters); >>>>> } >>>>> >>>>> - ret = ext4_free_group_clusters(sb, gdp) + count_clusters; >>>>> - ext4_free_group_clusters_set(sb, gdp, ret); >>>>> - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); >>>>> - ext4_group_desc_csum_set(sb, block_group, gdp); >>>>> ext4_unlock_group(sb, block_group); >>>>> >>>>> if (sbi->s_log_groups_per_flex) { >>>> >>>> >>>> Adding piece of code here... >>>> >>>> if (sbi->s_log_groups_per_flex) { >>>> ext4_group_t flex_group = ext4_flex_group(sbi, block_group); >>>> atomic64_add(count_clusters, >>>> &sbi_array_rcu_deref(sbi, s_flex_groups, >>>> flex_group)->free_clusters); >>>> } >>>> >>>> <...> >>>> >>>> /* We dirtied the bitmap block */ >>>> BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); >>>> err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); >>>> /* And the group descriptor block */ >>>> BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); >>>> ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); >>>> if (!err) >>>> err = ret; >>>> >>>> I was thinking even this can go around bitmap freeing logic above. So >>>> the next patch becomes very clear that all of the bitmap freeing logic >>>> is just simply moved into ext4_mb_mark_context() function. >>>> >>>> -ritesh >>>> >>> >>> >