Hi Jaegeuk, On 2017/12/21 4:39, Jaegeuk Kim wrote: > On 12/20, Gaoxiang (OS) wrote: >> Hi Jaegeuk, >> >> On 2017/12/20 10:31, Jaegeuk Kim wrote: >>> On 12/20, Gaoxiang (OS) wrote: >>>> Hi Jaegeuk, >>>> >>>> On 2017/12/20 7:40, Jaegeuk Kim wrote: >>>>> On 12/17, gaoxiang (P) wrote: >>>>>> clean up some redundant repeat code in build_sit_entries >>>>>> and seperate some logic into new functions: >>>>>> - build_discard_entries >>>>>> - build_sec_entries >>>>> >>>>> This patch makes another big loop redundantly? >>>>> >>>> "condition & jump" is the fundamental of "if" and loop. >>>> In the worst case, both f2fs_discard_en(sbi) and sbi->segs_per_sec > 1 >>>> are true, and I think, we have: >>>> - the original approach: >>>> MAIN_SEGS <-- loop condition >>>> MAIN_SEGS <-- f2fs_discard_en(sbi) >>>> MAIN_SEGS <-- sbi->segs_per_sec > 1 >>>> >>>> sits_in_cursum(journal) <-- loop_condition >>>> sits_in_cursum(journal) <-- f2fs_discard_en(sbi) >>>> sits_in_cursum(journal) <-- sbi->segs_per_sec > 1 >>>> it takes about 3*(MAIN_SEGS + sits_in_cursum(journal)) jumps effort. >>>> - this patch: >>>> MAIN_SEGS <-- loop condition >>>> sits_in_cursum(journal) <-- loop_condition >>>> >>>> MAIN_SEGS <-- f2fs_discard_en(sbi) >>>> MAIN_SEGS <-- sbi->segs_per_sec > 1 >>>> it takes abourt 3*MAIN_SEGS + sits_in_cursum(journal) jumps effort. >>>> >>>> and in the best case, both f2fs_discard_en(sbi) and sbi->segs_per_sec > >>>> 1 are false, and we have: >>>> - this patch >>>> MAIN_SEGS <-- loop condition >>>> >>>> sits_in_cursum(journal) <-- loop_condition >>>> See >>>> https://stackoverflow.com/questions/1462710/can-c-compilers-optimize-if-statements-inside-for-loops >>> >>> Interesting! >>> >>> What I can think of the worst case looks like: >>> >>> In current code, >>> for (N) { >>> do_something; >>> if (f2fs_discard_en()) >>> do _something; >>> if (sbi->segs_per_sec > 1) >>> do_something; >>> } >>> for (sits_in_cursum()) { >>> do_something; >>> if (f2fs_discard_en()) >>> do _something; >>> if (sbi->segs_per_sec > 1) >>> do_something; >>> } >>> => O(N + sits_in_cursum()) >> >> In the worst case Its 3(1 -- exit condition < MAIN_SEG, 2 -- f2fs_discard_en(), 3 -- >> sbi->segs_per_sec > 1) *N + 3*sits_in_cursum() if condition & jump >> instuctions at runtime. >> >> If we use Big O notion, yes I think O(N + sits_in_cursum()) >> >>> >>> Once compiler unrolled the loop, we can expect most of jumps could be hit by >>> branch predictor, since the loop does not have many branches. >> >> The current code, if the compiler decides to unroll the loop, it could unroll it into, I guess >> for (N) { >> do_something; >> } >> if (f2fs_discard_en()) { >> for(N) { >> do _something; >> } >> } >> if (sbi->segs_per_sec > 1) { >> for(N) { >> do_something; >> } >> } >> ... >> for (sits_in_cursum()) { >> do_something; >> } >> if (f2fs_discard_en()) { >> for (N) { >> do _something; >> } >> } >> if (sbi->segs_per_sec > 1) { >> for (N) { >> do_something; >> } >> } > > Unrolled loop would be something like: > for (N/n) { > do_something #1; > if (f2fs_discard_en()) > do_something; > if (sbi->segs_per_sec > 1) > do_something; > ... > do_something #n; > if (f2fs_discard_en()) > do_something; > if (sbi->segs_per_sec > 1) > do_something; > } > > And, branch predictor can avoid three branch conditions by BTB. > > do_something #1; > if (f2fs_discard_en()) > do _something; > if (sbi->segs_per_sec > 1) > do_something; > do_something #2; > if (f2fs_discard_en()) > do _something; > if (sbi->segs_per_sec > 1) > do_something; > ... > do_something #N; > if (f2fs_discard_en()) > do _something; > if (sbi->segs_per_sec > 1) > do_something; > > Anyway, I don't see huge benefit regarding to performance and code readability, > but do worry about potential patch conflicts when backporting this. > ok, I think it depends on the kind and complexity of BTB the target processor or architecture uses and how the target architecture enhances parallelization. I am new to f2fs, curious about the f2fs detailed implementation. I'm more focused on the code readability since some code with jumps are a little bit hacky. :) Thanks, >> Taking branch predictor into consideration, I think the unrolled one is more easier to predict >> than the current rolled one. (as you say, the current has more conditions in the loop >> and a exit condition to predict) >> >>> >>> In the patch, >>> for (N) >>> do_something; >>> for (sits_in_cursum()) >>> do_something; >>> >>> if (f2fs_discard_en()) { >>> for (N) >>> do_something; >>> } >>> if (sbi->segs_per_sec > 1) { >>> for (N) >>> do_something; >>> } >>> => O(3*N + sits_in_cursum()) >> >> If we use Big O notion, I think O(N + sits_in_cursum()) too. >>> >>> BTW, build_discard_entries(struct f2fs_sb_info *sbi) is more likely to split >>> the loops, as you describe. >>> >>> In build_discard_entries(), >>> for (N) { >>> if (CP_TRIMMED_FLAG) >>> do_something; >>> else >>> do_semething_else; >>> } >>> Thanks, >>> >> >> I think so, yet CP_TRIMMED_FLAG comes from a function argument, so I think compiler >> cannot decide whether it is constant. >> >> I think if we count the runtime condition & jump instruction at runtime. >> the unrolls above are beneficial, I think. >> >> Thanks. >> >>>> >>>> In addtion, this approach is easier to read than the original >>>> one, so splitting the loop would be beneficial. >>>> >>>> Thanks >>>> >>>>>> >>>>>> Note that "if (f2fs_discard_en(sbi))" and "if (sbi->segs_per_sec > 1)" >>>>>> are all taken out of the loops they are unchangable. >>>>>> >>>>>> Signed-off-by: Gao Xiang <gaoxiang25@xxxxxxxxxx> >>>>>> --- >>>>>> fs/f2fs/segment.c | 80 +++++++++++++++++++++++++------------------------------ >>>>>> 1 file changed, 37 insertions(+), 43 deletions(-) >>>>>> >>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c >>>>>> index 40e1d20..bcd8a40 100644 >>>>>> --- a/fs/f2fs/segment.c >>>>>> +++ b/fs/f2fs/segment.c >>>>>> @@ -3476,12 +3476,39 @@ static int build_curseg(struct f2fs_sb_info *sbi) >>>>>> return restore_curseg_summaries(sbi); >>>>>> } >>>>>> >>>>>> +static void build_discard_entries(struct f2fs_sb_info *sbi) >>>>>> +{ >>>>>> + unsigned segno; >>>>>> + >>>>>> + for (segno = 0; segno < MAIN_SEGS(sbi); ++segno) { >>>>>> + struct seg_entry *se = get_seg_entry(sbi, segno); >>>>>> + >>>>>> + if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) >>>>>> + memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE); >>>>>> + else { >>>>>> + memcpy(se->discard_map, se->cur_valid_map, >>>>>> + SIT_VBLOCK_MAP_SIZE); >>>>>> + >>>>>> + sbi->discard_blks += sbi->blocks_per_seg - >>>>>> + se->valid_blocks; >>>>>> + } >>>>>> + } >>>>>> +} >>>>>> + >>>>>> +static void build_sec_entries(struct f2fs_sb_info *sbi) >>>>>> +{ >>>>>> + unsigned segno; >>>>>> + >>>>>> + for (segno = 0; segno < MAIN_SEGS(sbi); ++segno) >>>>>> + get_sec_entry(sbi, segno)->valid_blocks += >>>>>> + get_seg_entry(sbi, segno)->valid_blocks; >>>>>> +} >>>>>> + >>>>>> static void build_sit_entries(struct f2fs_sb_info *sbi) >>>>>> { >>>>>> struct sit_info *sit_i = SIT_I(sbi); >>>>>> struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA); >>>>>> struct f2fs_journal *journal = curseg->journal; >>>>>> - struct seg_entry *se; >>>>>> struct f2fs_sit_entry sit; >>>>>> int sit_blk_cnt = SIT_BLK_CNT(sbi); >>>>>> unsigned int i, start, end; >>>>>> @@ -3498,67 +3525,34 @@ static void build_sit_entries(struct f2fs_sb_info *sbi) >>>>>> struct f2fs_sit_block *sit_blk; >>>>>> struct page *page; >>>>>> >>>>>> - se = &sit_i->sentries[start]; >>>>>> page = get_current_sit_page(sbi, start); >>>>>> sit_blk = (struct f2fs_sit_block *)page_address(page); >>>>>> sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)]; >>>>>> f2fs_put_page(page, 1); >>>>>> >>>>>> check_block_count(sbi, start, &sit); >>>>>> - seg_info_from_raw_sit(se, &sit); >>>>>> - >>>>>> - /* build discard map only one time */ >>>>>> - if (f2fs_discard_en(sbi)) { >>>>>> - if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { >>>>>> - memset(se->discard_map, 0xff, >>>>>> - SIT_VBLOCK_MAP_SIZE); >>>>>> - } else { >>>>>> - memcpy(se->discard_map, >>>>>> - se->cur_valid_map, >>>>>> - SIT_VBLOCK_MAP_SIZE); >>>>>> - sbi->discard_blks += >>>>>> - sbi->blocks_per_seg - >>>>>> - se->valid_blocks; >>>>>> - } >>>>>> - } >>>>>> >>>>>> - if (sbi->segs_per_sec > 1) >>>>>> - get_sec_entry(sbi, start)->valid_blocks += >>>>>> - se->valid_blocks; >>>>>> + seg_info_from_raw_sit(&sit_i->sentries[start], &sit); >>>>>> } >>>>>> start_blk += readed; >>>>>> } while (start_blk < sit_blk_cnt); >>>>>> >>>>>> down_read(&curseg->journal_rwsem); >>>>>> for (i = 0; i < sits_in_cursum(journal); i++) { >>>>>> - unsigned int old_valid_blocks; >>>>>> - >>>>>> start = le32_to_cpu(segno_in_journal(journal, i)); >>>>>> - se = &sit_i->sentries[start]; >>>>>> sit = sit_in_journal(journal, i); >>>>>> >>>>>> - old_valid_blocks = se->valid_blocks; >>>>>> - >>>>>> check_block_count(sbi, start, &sit); >>>>>> - seg_info_from_raw_sit(se, &sit); >>>>>> - >>>>>> - if (f2fs_discard_en(sbi)) { >>>>>> - if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { >>>>>> - memset(se->discard_map, 0xff, >>>>>> - SIT_VBLOCK_MAP_SIZE); >>>>>> - } else { >>>>>> - memcpy(se->discard_map, se->cur_valid_map, >>>>>> - SIT_VBLOCK_MAP_SIZE); >>>>>> - sbi->discard_blks += old_valid_blocks - >>>>>> - se->valid_blocks; >>>>>> - } >>>>>> - } >>>>>> - >>>>>> - if (sbi->segs_per_sec > 1) >>>>>> - get_sec_entry(sbi, start)->valid_blocks += >>>>>> - se->valid_blocks - old_valid_blocks; >>>>>> + seg_info_from_raw_sit(&sit_i->sentries[start], &sit); >>>>>> } >>>>>> up_read(&curseg->journal_rwsem); >>>>>> + >>>>>> + /* build discard map only one time */ >>>>>> + if (f2fs_discard_en(sbi)) >>>>>> + build_discard_entries(sbi); >>>>>> + >>>>>> + if (sbi->segs_per_sec > 1) >>>>>> + build_sec_entries(sbi); >>>>>> } >>>>>> >>>>>> static void init_free_segmap(struct f2fs_sb_info *sbi) >>>>>> -- >>>>>> 2.1.4