Re: [f2fs-dev] [PATCH RFC] f2fs: clean up build_sit_entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/20, Gaoxiang (OS) wrote:
> Hi Jaegeuk,
> 
> On 2017/12/20 10:31, Jaegeuk Kim wrote:
> > On 12/20, Gaoxiang (OS) wrote:
> >> Hi Jaegeuk,
> >>
> >> On 2017/12/20 7:40, Jaegeuk Kim wrote:
> >>> On 12/17, gaoxiang (P) wrote:
> >>>> clean up some redundant repeat code in build_sit_entries
> >>>> and seperate some logic into new functions:
> >>>>    - build_discard_entries
> >>>>    - build_sec_entries
> >>>
> >>> This patch makes another big loop redundantly?
> >>>
> >> "condition & jump" is the fundamental of "if" and loop.
> >> In the worst case, both f2fs_discard_en(sbi) and sbi->segs_per_sec > 1
> >> are true, and I think, we have:
> >> - the original approach:
> >> 	MAIN_SEGS   <-- loop condition
> >> 	MAIN_SEGS   <-- f2fs_discard_en(sbi)
> >> 	MAIN_SEGS   <-- sbi->segs_per_sec > 1
> >>
> >> 	sits_in_cursum(journal) <-- loop_condition
> >> 	sits_in_cursum(journal) <-- f2fs_discard_en(sbi)
> >> 	sits_in_cursum(journal) <-- sbi->segs_per_sec > 1
> >>      it takes about 3*(MAIN_SEGS + sits_in_cursum(journal)) jumps effort.
> >> - this patch:
> >> 	MAIN_SEGS   <-- loop condition
> >> 	sits_in_cursum(journal) <-- loop_condition
> >>
> >> 	MAIN_SEGS   <-- f2fs_discard_en(sbi)
> >> 	MAIN_SEGS   <-- sbi->segs_per_sec > 1
> >>      it takes abourt 3*MAIN_SEGS + sits_in_cursum(journal) jumps effort.
> >>
> >> and in the best case, both f2fs_discard_en(sbi) and sbi->segs_per_sec >
> >> 1 are false, and we have:
> >> - this patch
> >> 	MAIN_SEGS   <-- loop condition
> >>
> >> 	sits_in_cursum(journal) <-- loop_condition
> >> See
> >> https://stackoverflow.com/questions/1462710/can-c-compilers-optimize-if-statements-inside-for-loops
> > 
> > Interesting!
> > 
> > What I can think of the worst case looks like:
> > 
> > In current code,
> > 	for (N) {
> > 		do_something;
> > 		if (f2fs_discard_en())
> > 			do _something;
> > 		if (sbi->segs_per_sec > 1)
> > 			do_something;
> > 	}
> > 	for (sits_in_cursum()) {
> > 		do_something;
> > 		if (f2fs_discard_en())
> > 			do _something;
> > 		if (sbi->segs_per_sec > 1)
> > 			do_something;
> > 	}
> > => O(N + sits_in_cursum())
> 
> In the worst case Its 3(1 -- exit condition < MAIN_SEG, 2 --  f2fs_discard_en(), 3 -- 
> sbi->segs_per_sec > 1) *N + 3*sits_in_cursum() if condition & jump 
> instuctions at runtime.
> 
> If we use Big O notion, yes I think O(N + sits_in_cursum())
> 
> > 
> > Once compiler unrolled the loop, we can expect most of jumps could be hit by
> > branch predictor, since the loop does not have many branches.
> 
> The current code, if the compiler decides to unroll the loop, it could unroll it into, I guess
>   	for (N) {
>   		do_something;
> 	}
> 	if (f2fs_discard_en()) {
> 		for(N) {
>   			do _something;
> 		}
> 	}
> 	if (sbi->segs_per_sec > 1) {
> 		for(N) {
> 			do_something;
> 		}
> 	}
> 	...
> 	for (sits_in_cursum()) {
> 		do_something;
> 	}
> 	if (f2fs_discard_en()) {
> 		for (N) {
> 			do _something;
> 		}
> 	}
> 	if (sbi->segs_per_sec > 1) {
> 		for (N) {
> 			do_something;
> 		}
>         	}

Unrolled loop would be something like:
	for (N/n) {
		do_something #1;
		if (f2fs_discard_en())
			do_something;
		if (sbi->segs_per_sec > 1)
			do_something;
		...
		do_something #n;
		if (f2fs_discard_en())
			do_something;
		if (sbi->segs_per_sec > 1)
			do_something;
	}

And, branch predictor can avoid three branch conditions by BTB.

do_something #1;
  if (f2fs_discard_en())
    do _something;
      if (sbi->segs_per_sec > 1)
        do_something;
          do_something #2;
            if (f2fs_discard_en())
              do _something;
                if (sbi->segs_per_sec > 1)
                  do_something;
                    ...
                      do_something #N;
                        if (f2fs_discard_en())
                          do _something;
                            if (sbi->segs_per_sec > 1)
                              do_something;

Anyway, I don't see huge benefit regarding to performance and code readability,
but do worry about potential patch conflicts when backporting this.

> Taking branch predictor into consideration, I think the unrolled one is more easier to predict
> than the current rolled one. (as you say, the current has more conditions in the loop
> and a exit condition to predict)
> 
> > 
> > In the patch,
> > 	for (N)
> > 		do_something;
> > 	for (sits_in_cursum())
> > 		do_something;
> > 
> > 	if (f2fs_discard_en()) {
> > 		for (N)
> > 			do_something;
> > 	}
> > 	if (sbi->segs_per_sec > 1) {
> > 		for (N)
> > 			do_something;
> > 	}
> > => O(3*N + sits_in_cursum())
> 
> If we use Big O notion, I think O(N + sits_in_cursum()) too.
> > 
> > BTW, build_discard_entries(struct f2fs_sb_info *sbi) is more likely to split
> > the loops, as you describe.
> > 
> > In build_discard_entries(),
> > 	for (N) {
> > 		if (CP_TRIMMED_FLAG)
> > 			do_something;
> > 		else
> > 			do_semething_else;
> > 	}
> > Thanks,
> > 
> 
> I think so, yet CP_TRIMMED_FLAG comes from a function argument, so I think compiler
> cannot decide whether it is constant.
> 
> I think if we count the runtime condition & jump instruction at runtime. 
> the unrolls above are beneficial, I think.
> 
> Thanks.
> 
> >>
> >> In addtion, this approach is easier to read than the original
> >> one, so splitting the loop would be beneficial.
> >>
> >> Thanks
> >>
> >>>>
> >>>> Note that "if (f2fs_discard_en(sbi))" and "if (sbi->segs_per_sec > 1)"
> >>>> are all taken out of the loops they are unchangable.
> >>>>
> >>>> Signed-off-by: Gao Xiang <gaoxiang25@xxxxxxxxxx>
> >>>> ---
> >>>>    fs/f2fs/segment.c | 80 +++++++++++++++++++++++++------------------------------
> >>>>    1 file changed, 37 insertions(+), 43 deletions(-)
> >>>>
> >>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> >>>> index 40e1d20..bcd8a40 100644
> >>>> --- a/fs/f2fs/segment.c
> >>>> +++ b/fs/f2fs/segment.c
> >>>> @@ -3476,12 +3476,39 @@ static int build_curseg(struct f2fs_sb_info *sbi)
> >>>>    	return restore_curseg_summaries(sbi);
> >>>>    }
> >>>>    
> >>>> +static void build_discard_entries(struct f2fs_sb_info *sbi)
> >>>> +{
> >>>> +	unsigned segno;
> >>>> +
> >>>> +	for (segno = 0; segno < MAIN_SEGS(sbi); ++segno) {
> >>>> +		struct seg_entry *se = get_seg_entry(sbi, segno);
> >>>> +
> >>>> +		if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG))
> >>>> +			memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
> >>>> +		else {
> >>>> +			memcpy(se->discard_map, se->cur_valid_map,
> >>>> +				SIT_VBLOCK_MAP_SIZE);
> >>>> +
> >>>> +			sbi->discard_blks += sbi->blocks_per_seg -
> >>>> +				se->valid_blocks;
> >>>> +		}
> >>>> +	}
> >>>> +}
> >>>> +
> >>>> +static void build_sec_entries(struct f2fs_sb_info *sbi)
> >>>> +{
> >>>> +	unsigned segno;
> >>>> +
> >>>> +	for (segno = 0; segno < MAIN_SEGS(sbi); ++segno)
> >>>> +		get_sec_entry(sbi, segno)->valid_blocks +=
> >>>> +			get_seg_entry(sbi, segno)->valid_blocks;
> >>>> +}
> >>>> +
> >>>>    static void build_sit_entries(struct f2fs_sb_info *sbi)
> >>>>    {
> >>>>    	struct sit_info *sit_i = SIT_I(sbi);
> >>>>    	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
> >>>>    	struct f2fs_journal *journal = curseg->journal;
> >>>> -	struct seg_entry *se;
> >>>>    	struct f2fs_sit_entry sit;
> >>>>    	int sit_blk_cnt = SIT_BLK_CNT(sbi);
> >>>>    	unsigned int i, start, end;
> >>>> @@ -3498,67 +3525,34 @@ static void build_sit_entries(struct f2fs_sb_info *sbi)
> >>>>    			struct f2fs_sit_block *sit_blk;
> >>>>    			struct page *page;
> >>>>    
> >>>> -			se = &sit_i->sentries[start];
> >>>>    			page = get_current_sit_page(sbi, start);
> >>>>    			sit_blk = (struct f2fs_sit_block *)page_address(page);
> >>>>    			sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)];
> >>>>    			f2fs_put_page(page, 1);
> >>>>    
> >>>>    			check_block_count(sbi, start, &sit);
> >>>> -			seg_info_from_raw_sit(se, &sit);
> >>>> -
> >>>> -			/* build discard map only one time */
> >>>> -			if (f2fs_discard_en(sbi)) {
> >>>> -				if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
> >>>> -					memset(se->discard_map, 0xff,
> >>>> -						SIT_VBLOCK_MAP_SIZE);
> >>>> -				} else {
> >>>> -					memcpy(se->discard_map,
> >>>> -						se->cur_valid_map,
> >>>> -						SIT_VBLOCK_MAP_SIZE);
> >>>> -					sbi->discard_blks +=
> >>>> -						sbi->blocks_per_seg -
> >>>> -						se->valid_blocks;
> >>>> -				}
> >>>> -			}
> >>>>    
> >>>> -			if (sbi->segs_per_sec > 1)
> >>>> -				get_sec_entry(sbi, start)->valid_blocks +=
> >>>> -							se->valid_blocks;
> >>>> +			seg_info_from_raw_sit(&sit_i->sentries[start], &sit);
> >>>>    		}
> >>>>    		start_blk += readed;
> >>>>    	} while (start_blk < sit_blk_cnt);
> >>>>    
> >>>>    	down_read(&curseg->journal_rwsem);
> >>>>    	for (i = 0; i < sits_in_cursum(journal); i++) {
> >>>> -		unsigned int old_valid_blocks;
> >>>> -
> >>>>    		start = le32_to_cpu(segno_in_journal(journal, i));
> >>>> -		se = &sit_i->sentries[start];
> >>>>    		sit = sit_in_journal(journal, i);
> >>>>    
> >>>> -		old_valid_blocks = se->valid_blocks;
> >>>> -
> >>>>    		check_block_count(sbi, start, &sit);
> >>>> -		seg_info_from_raw_sit(se, &sit);
> >>>> -
> >>>> -		if (f2fs_discard_en(sbi)) {
> >>>> -			if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
> >>>> -				memset(se->discard_map, 0xff,
> >>>> -							SIT_VBLOCK_MAP_SIZE);
> >>>> -			} else {
> >>>> -				memcpy(se->discard_map, se->cur_valid_map,
> >>>> -							SIT_VBLOCK_MAP_SIZE);
> >>>> -				sbi->discard_blks += old_valid_blocks -
> >>>> -							se->valid_blocks;
> >>>> -			}
> >>>> -		}
> >>>> -
> >>>> -		if (sbi->segs_per_sec > 1)
> >>>> -			get_sec_entry(sbi, start)->valid_blocks +=
> >>>> -				se->valid_blocks - old_valid_blocks;
> >>>> +		seg_info_from_raw_sit(&sit_i->sentries[start], &sit);
> >>>>    	}
> >>>>    	up_read(&curseg->journal_rwsem);
> >>>> +
> >>>> +	/* build discard map only one time */
> >>>> +	if (f2fs_discard_en(sbi))
> >>>> +		build_discard_entries(sbi);
> >>>> +
> >>>> +	if (sbi->segs_per_sec > 1)
> >>>> +		build_sec_entries(sbi);
> >>>>    }
> >>>>    
> >>>>    static void init_free_segmap(struct f2fs_sb_info *sbi)
> >>>> -- 
> >>>> 2.1.4



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux