On 12/20, Gaoxiang (OS) wrote: > Hi Jaegeuk, > > On 2017/12/20 10:31, Jaegeuk Kim wrote: > > On 12/20, Gaoxiang (OS) wrote: > >> Hi Jaegeuk, > >> > >> On 2017/12/20 7:40, Jaegeuk Kim wrote: > >>> On 12/17, gaoxiang (P) wrote: > >>>> clean up some redundant repeat code in build_sit_entries > >>>> and seperate some logic into new functions: > >>>> - build_discard_entries > >>>> - build_sec_entries > >>> > >>> This patch makes another big loop redundantly? > >>> > >> "condition & jump" is the fundamental of "if" and loop. > >> In the worst case, both f2fs_discard_en(sbi) and sbi->segs_per_sec > 1 > >> are true, and I think, we have: > >> - the original approach: > >> MAIN_SEGS <-- loop condition > >> MAIN_SEGS <-- f2fs_discard_en(sbi) > >> MAIN_SEGS <-- sbi->segs_per_sec > 1 > >> > >> sits_in_cursum(journal) <-- loop_condition > >> sits_in_cursum(journal) <-- f2fs_discard_en(sbi) > >> sits_in_cursum(journal) <-- sbi->segs_per_sec > 1 > >> it takes about 3*(MAIN_SEGS + sits_in_cursum(journal)) jumps effort. > >> - this patch: > >> MAIN_SEGS <-- loop condition > >> sits_in_cursum(journal) <-- loop_condition > >> > >> MAIN_SEGS <-- f2fs_discard_en(sbi) > >> MAIN_SEGS <-- sbi->segs_per_sec > 1 > >> it takes abourt 3*MAIN_SEGS + sits_in_cursum(journal) jumps effort. > >> > >> and in the best case, both f2fs_discard_en(sbi) and sbi->segs_per_sec > > >> 1 are false, and we have: > >> - this patch > >> MAIN_SEGS <-- loop condition > >> > >> sits_in_cursum(journal) <-- loop_condition > >> See > >> https://stackoverflow.com/questions/1462710/can-c-compilers-optimize-if-statements-inside-for-loops > > > > Interesting! > > > > What I can think of the worst case looks like: > > > > In current code, > > for (N) { > > do_something; > > if (f2fs_discard_en()) > > do _something; > > if (sbi->segs_per_sec > 1) > > do_something; > > } > > for (sits_in_cursum()) { > > do_something; > > if (f2fs_discard_en()) > > do _something; > > if (sbi->segs_per_sec > 1) > > do_something; > > } > > => O(N + sits_in_cursum()) > > In the worst case Its 3(1 -- exit condition < MAIN_SEG, 2 -- f2fs_discard_en(), 3 -- > sbi->segs_per_sec > 1) *N + 3*sits_in_cursum() if condition & jump > instuctions at runtime. > > If we use Big O notion, yes I think O(N + sits_in_cursum()) > > > > > Once compiler unrolled the loop, we can expect most of jumps could be hit by > > branch predictor, since the loop does not have many branches. > > The current code, if the compiler decides to unroll the loop, it could unroll it into, I guess > for (N) { > do_something; > } > if (f2fs_discard_en()) { > for(N) { > do _something; > } > } > if (sbi->segs_per_sec > 1) { > for(N) { > do_something; > } > } > ... > for (sits_in_cursum()) { > do_something; > } > if (f2fs_discard_en()) { > for (N) { > do _something; > } > } > if (sbi->segs_per_sec > 1) { > for (N) { > do_something; > } > } Unrolled loop would be something like: for (N/n) { do_something #1; if (f2fs_discard_en()) do_something; if (sbi->segs_per_sec > 1) do_something; ... do_something #n; if (f2fs_discard_en()) do_something; if (sbi->segs_per_sec > 1) do_something; } And, branch predictor can avoid three branch conditions by BTB. do_something #1; if (f2fs_discard_en()) do _something; if (sbi->segs_per_sec > 1) do_something; do_something #2; if (f2fs_discard_en()) do _something; if (sbi->segs_per_sec > 1) do_something; ... do_something #N; if (f2fs_discard_en()) do _something; if (sbi->segs_per_sec > 1) do_something; Anyway, I don't see huge benefit regarding to performance and code readability, but do worry about potential patch conflicts when backporting this. > Taking branch predictor into consideration, I think the unrolled one is more easier to predict > than the current rolled one. (as you say, the current has more conditions in the loop > and a exit condition to predict) > > > > > In the patch, > > for (N) > > do_something; > > for (sits_in_cursum()) > > do_something; > > > > if (f2fs_discard_en()) { > > for (N) > > do_something; > > } > > if (sbi->segs_per_sec > 1) { > > for (N) > > do_something; > > } > > => O(3*N + sits_in_cursum()) > > If we use Big O notion, I think O(N + sits_in_cursum()) too. > > > > BTW, build_discard_entries(struct f2fs_sb_info *sbi) is more likely to split > > the loops, as you describe. > > > > In build_discard_entries(), > > for (N) { > > if (CP_TRIMMED_FLAG) > > do_something; > > else > > do_semething_else; > > } > > Thanks, > > > > I think so, yet CP_TRIMMED_FLAG comes from a function argument, so I think compiler > cannot decide whether it is constant. > > I think if we count the runtime condition & jump instruction at runtime. > the unrolls above are beneficial, I think. > > Thanks. > > >> > >> In addtion, this approach is easier to read than the original > >> one, so splitting the loop would be beneficial. > >> > >> Thanks > >> > >>>> > >>>> Note that "if (f2fs_discard_en(sbi))" and "if (sbi->segs_per_sec > 1)" > >>>> are all taken out of the loops they are unchangable. > >>>> > >>>> Signed-off-by: Gao Xiang <gaoxiang25@xxxxxxxxxx> > >>>> --- > >>>> fs/f2fs/segment.c | 80 +++++++++++++++++++++++++------------------------------ > >>>> 1 file changed, 37 insertions(+), 43 deletions(-) > >>>> > >>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > >>>> index 40e1d20..bcd8a40 100644 > >>>> --- a/fs/f2fs/segment.c > >>>> +++ b/fs/f2fs/segment.c > >>>> @@ -3476,12 +3476,39 @@ static int build_curseg(struct f2fs_sb_info *sbi) > >>>> return restore_curseg_summaries(sbi); > >>>> } > >>>> > >>>> +static void build_discard_entries(struct f2fs_sb_info *sbi) > >>>> +{ > >>>> + unsigned segno; > >>>> + > >>>> + for (segno = 0; segno < MAIN_SEGS(sbi); ++segno) { > >>>> + struct seg_entry *se = get_seg_entry(sbi, segno); > >>>> + > >>>> + if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) > >>>> + memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE); > >>>> + else { > >>>> + memcpy(se->discard_map, se->cur_valid_map, > >>>> + SIT_VBLOCK_MAP_SIZE); > >>>> + > >>>> + sbi->discard_blks += sbi->blocks_per_seg - > >>>> + se->valid_blocks; > >>>> + } > >>>> + } > >>>> +} > >>>> + > >>>> +static void build_sec_entries(struct f2fs_sb_info *sbi) > >>>> +{ > >>>> + unsigned segno; > >>>> + > >>>> + for (segno = 0; segno < MAIN_SEGS(sbi); ++segno) > >>>> + get_sec_entry(sbi, segno)->valid_blocks += > >>>> + get_seg_entry(sbi, segno)->valid_blocks; > >>>> +} > >>>> + > >>>> static void build_sit_entries(struct f2fs_sb_info *sbi) > >>>> { > >>>> struct sit_info *sit_i = SIT_I(sbi); > >>>> struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA); > >>>> struct f2fs_journal *journal = curseg->journal; > >>>> - struct seg_entry *se; > >>>> struct f2fs_sit_entry sit; > >>>> int sit_blk_cnt = SIT_BLK_CNT(sbi); > >>>> unsigned int i, start, end; > >>>> @@ -3498,67 +3525,34 @@ static void build_sit_entries(struct f2fs_sb_info *sbi) > >>>> struct f2fs_sit_block *sit_blk; > >>>> struct page *page; > >>>> > >>>> - se = &sit_i->sentries[start]; > >>>> page = get_current_sit_page(sbi, start); > >>>> sit_blk = (struct f2fs_sit_block *)page_address(page); > >>>> sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)]; > >>>> f2fs_put_page(page, 1); > >>>> > >>>> check_block_count(sbi, start, &sit); > >>>> - seg_info_from_raw_sit(se, &sit); > >>>> - > >>>> - /* build discard map only one time */ > >>>> - if (f2fs_discard_en(sbi)) { > >>>> - if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { > >>>> - memset(se->discard_map, 0xff, > >>>> - SIT_VBLOCK_MAP_SIZE); > >>>> - } else { > >>>> - memcpy(se->discard_map, > >>>> - se->cur_valid_map, > >>>> - SIT_VBLOCK_MAP_SIZE); > >>>> - sbi->discard_blks += > >>>> - sbi->blocks_per_seg - > >>>> - se->valid_blocks; > >>>> - } > >>>> - } > >>>> > >>>> - if (sbi->segs_per_sec > 1) > >>>> - get_sec_entry(sbi, start)->valid_blocks += > >>>> - se->valid_blocks; > >>>> + seg_info_from_raw_sit(&sit_i->sentries[start], &sit); > >>>> } > >>>> start_blk += readed; > >>>> } while (start_blk < sit_blk_cnt); > >>>> > >>>> down_read(&curseg->journal_rwsem); > >>>> for (i = 0; i < sits_in_cursum(journal); i++) { > >>>> - unsigned int old_valid_blocks; > >>>> - > >>>> start = le32_to_cpu(segno_in_journal(journal, i)); > >>>> - se = &sit_i->sentries[start]; > >>>> sit = sit_in_journal(journal, i); > >>>> > >>>> - old_valid_blocks = se->valid_blocks; > >>>> - > >>>> check_block_count(sbi, start, &sit); > >>>> - seg_info_from_raw_sit(se, &sit); > >>>> - > >>>> - if (f2fs_discard_en(sbi)) { > >>>> - if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { > >>>> - memset(se->discard_map, 0xff, > >>>> - SIT_VBLOCK_MAP_SIZE); > >>>> - } else { > >>>> - memcpy(se->discard_map, se->cur_valid_map, > >>>> - SIT_VBLOCK_MAP_SIZE); > >>>> - sbi->discard_blks += old_valid_blocks - > >>>> - se->valid_blocks; > >>>> - } > >>>> - } > >>>> - > >>>> - if (sbi->segs_per_sec > 1) > >>>> - get_sec_entry(sbi, start)->valid_blocks += > >>>> - se->valid_blocks - old_valid_blocks; > >>>> + seg_info_from_raw_sit(&sit_i->sentries[start], &sit); > >>>> } > >>>> up_read(&curseg->journal_rwsem); > >>>> + > >>>> + /* build discard map only one time */ > >>>> + if (f2fs_discard_en(sbi)) > >>>> + build_discard_entries(sbi); > >>>> + > >>>> + if (sbi->segs_per_sec > 1) > >>>> + build_sec_entries(sbi); > >>>> } > >>>> > >>>> static void init_free_segmap(struct f2fs_sb_info *sbi) > >>>> -- > >>>> 2.1.4