Re: [f2fs-dev] [PATCH RFC] f2fs: clean up build_sit_entries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jaegeuk,

For compiler loop optimization,  refer to  "loop unswitching" section in 
https://zhuanlan.zhihu.com/p/23326776   (Chinese stack overflow-like website)
https://translate.google.com/translate?sl=zh-CN&tl=en&js=y&prev=_t&hl=zh-CN&ie=UTF-8&u=https%3A%2F%2Fzhuanlan.zhihu.com%2Fp%2F23326776&edit-text=&act=url

or

https://en.wikipedia.org/wiki/Loop_unswitching
http://www.compileroptimizations.com/category/unswitching.htm
https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html
-O3
Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize options.

For branch predictor, I think it is much easier to predict and analyze to parallelize
while (A){		// only one miss
	...
}

if (B) {
	while (A) {	// only one miss
		...
	}
}

Than

while (A) {
	...
	if (B) {		// A->B->A->B crossing
		...
	}
}

For most of processors.

On 2017/12/20 10:31, Jaegeuk Kim wrote:
> On 12/20, Gaoxiang (OS) wrote:
>> Hi Jaegeuk,
>>
>> On 2017/12/20 7:40, Jaegeuk Kim wrote:
>>> On 12/17, gaoxiang (P) wrote:
>>>> clean up some redundant repeat code in build_sit_entries
>>>> and seperate some logic into new functions:
>>>>    - build_discard_entries
>>>>    - build_sec_entries
>>>
>>> This patch makes another big loop redundantly?
>>>
>> "condition & jump" is the fundamental of "if" and loop.
>> In the worst case, both f2fs_discard_en(sbi) and sbi->segs_per_sec > 1
>> are true, and I think, we have:
>> - the original approach:
>> 	MAIN_SEGS   <-- loop condition
>> 	MAIN_SEGS   <-- f2fs_discard_en(sbi)
>> 	MAIN_SEGS   <-- sbi->segs_per_sec > 1
>>
>> 	sits_in_cursum(journal) <-- loop_condition
>> 	sits_in_cursum(journal) <-- f2fs_discard_en(sbi)
>> 	sits_in_cursum(journal) <-- sbi->segs_per_sec > 1
>>      it takes about 3*(MAIN_SEGS + sits_in_cursum(journal)) jumps effort.
>> - this patch:
>> 	MAIN_SEGS   <-- loop condition
>> 	sits_in_cursum(journal) <-- loop_condition
>>
>> 	MAIN_SEGS   <-- f2fs_discard_en(sbi)
>> 	MAIN_SEGS   <-- sbi->segs_per_sec > 1
>>      it takes abourt 3*MAIN_SEGS + sits_in_cursum(journal) jumps effort.
>>
>> and in the best case, both f2fs_discard_en(sbi) and sbi->segs_per_sec >
>> 1 are false, and we have:
>> - this patch
>> 	MAIN_SEGS   <-- loop condition
>>
>> 	sits_in_cursum(journal) <-- loop_condition
>> See
>> https://stackoverflow.com/questions/1462710/can-c-compilers-optimize-if-statements-inside-for-loops
> 
> Interesting!
> 
> What I can think of the worst case looks like:
> 
> In current code,
> 	for (N) {
> 		do_something;
> 		if (f2fs_discard_en())
> 			do _something;
> 		if (sbi->segs_per_sec > 1)
> 			do_something;
> 	}
> 	for (sits_in_cursum()) {
> 		do_something;
> 		if (f2fs_discard_en())
> 			do _something;
> 		if (sbi->segs_per_sec > 1)
> 			do_something;
> 	}
> => O(N + sits_in_cursum())

In the worst case Its 3(1 -- exit condition < MAIN_SEG, 2 --  f2fs_discard_en(), 3 -- 
sbi->segs_per_sec > 1) *N + 3*sits_in_cursum() if condition & jump 
instuctions at runtime.

If we use Big O notion, yes I think O(N + sits_in_cursum())

> 
> Once compiler unrolled the loop, we can expect most of jumps could be hit by
> branch predictor, since the loop does not have many branches.

The current code, if the compiler decides to unroll the loop, it could unroll it into, I guess
  	for (N) {
  		do_something;
	}
	if (f2fs_discard_en()) {
		for(N) {
  			do _something;
		}
	}
	if (sbi->segs_per_sec > 1) {
		for(N) {
			do_something;
		}
	}
	...
	/* extra */
	for (sits_in_cursum()) {
		do_something;
	}
	if (f2fs_discard_en()) {
		for (sits_in_cursum()) {
			do _something;
		}
	}
	if (sbi->segs_per_sec > 1) {
		for (sits_in_cursum()) {
			do_something;
		}
        	}
Taking branch predictor into consideration, I think the unrolled one is more easier to predict
than the current rolled one. (as you say, the current has more conditions in the loop
and a exit condition to predict)

> 
> In the patch,
> 	for (N)
> 		do_something;
> 	for (sits_in_cursum())
> 		do_something;
> 
> 	if (f2fs_discard_en()) {
> 		for (N)
> 			do_something;
> 	}
> 	if (sbi->segs_per_sec > 1) {
> 		for (N)
> 			do_something;
> 	}
> => O(3*N + sits_in_cursum())

If we use Big O notion, I think O(N + sits_in_cursum()) too.


> 
> BTW, build_discard_entries(struct f2fs_sb_info *sbi) is more likely to split
> the loops, as you describe.
> 
> In build_discard_entries(),
> 	for (N) {
> 		if (CP_TRIMMED_FLAG)
> 			do_something;
> 		else
> 			do_semething_else;
> 	}
> Thanks,
> 

I think so, yet CP_TRIMMED_FLAG comes from a function argument, so I think compiler
cannot decide whether it is constant.

I think if we count the runtime condition & jump instruction at runtime. 
the unrolls above are beneficial, I think.

Thanks.

>>
>> In addtion, this approach is easier to read than the original
>> one, so splitting the loop would be beneficial.
>>
>> Thanks
>>
>>>>
>>>> Note that "if (f2fs_discard_en(sbi))" and "if (sbi->segs_per_sec > 1)"
>>>> are all taken out of the loops they are unchangable.
>>>>
>>>> Signed-off-by: Gao Xiang <gaoxiang25@xxxxxxxxxx>
>>>> ---
>>>>    fs/f2fs/segment.c | 80 +++++++++++++++++++++++++------------------------------
>>>>    1 file changed, 37 insertions(+), 43 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>> index 40e1d20..bcd8a40 100644
>>>> --- a/fs/f2fs/segment.c
>>>> +++ b/fs/f2fs/segment.c
>>>> @@ -3476,12 +3476,39 @@ static int build_curseg(struct f2fs_sb_info *sbi)
>>>>    	return restore_curseg_summaries(sbi);
>>>>    }
>>>>    
>>>> +static void build_discard_entries(struct f2fs_sb_info *sbi)
>>>> +{
>>>> +	unsigned segno;
>>>> +
>>>> +	for (segno = 0; segno < MAIN_SEGS(sbi); ++segno) {
>>>> +		struct seg_entry *se = get_seg_entry(sbi, segno);
>>>> +
>>>> +		if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG))
>>>> +			memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
>>>> +		else {
>>>> +			memcpy(se->discard_map, se->cur_valid_map,
>>>> +				SIT_VBLOCK_MAP_SIZE);
>>>> +
>>>> +			sbi->discard_blks += sbi->blocks_per_seg -
>>>> +				se->valid_blocks;
>>>> +		}
>>>> +	}
>>>> +}
>>>> +
>>>> +static void build_sec_entries(struct f2fs_sb_info *sbi)
>>>> +{
>>>> +	unsigned segno;
>>>> +
>>>> +	for (segno = 0; segno < MAIN_SEGS(sbi); ++segno)
>>>> +		get_sec_entry(sbi, segno)->valid_blocks +=
>>>> +			get_seg_entry(sbi, segno)->valid_blocks;
>>>> +}
>>>> +
>>>>    static void build_sit_entries(struct f2fs_sb_info *sbi)
>>>>    {
>>>>    	struct sit_info *sit_i = SIT_I(sbi);
>>>>    	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
>>>>    	struct f2fs_journal *journal = curseg->journal;
>>>> -	struct seg_entry *se;
>>>>    	struct f2fs_sit_entry sit;
>>>>    	int sit_blk_cnt = SIT_BLK_CNT(sbi);
>>>>    	unsigned int i, start, end;
>>>> @@ -3498,67 +3525,34 @@ static void build_sit_entries(struct f2fs_sb_info *sbi)
>>>>    			struct f2fs_sit_block *sit_blk;
>>>>    			struct page *page;
>>>>    
>>>> -			se = &sit_i->sentries[start];
>>>>    			page = get_current_sit_page(sbi, start);
>>>>    			sit_blk = (struct f2fs_sit_block *)page_address(page);
>>>>    			sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)];
>>>>    			f2fs_put_page(page, 1);
>>>>    
>>>>    			check_block_count(sbi, start, &sit);
>>>> -			seg_info_from_raw_sit(se, &sit);
>>>> -
>>>> -			/* build discard map only one time */
>>>> -			if (f2fs_discard_en(sbi)) {
>>>> -				if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
>>>> -					memset(se->discard_map, 0xff,
>>>> -						SIT_VBLOCK_MAP_SIZE);
>>>> -				} else {
>>>> -					memcpy(se->discard_map,
>>>> -						se->cur_valid_map,
>>>> -						SIT_VBLOCK_MAP_SIZE);
>>>> -					sbi->discard_blks +=
>>>> -						sbi->blocks_per_seg -
>>>> -						se->valid_blocks;
>>>> -				}
>>>> -			}
>>>>    
>>>> -			if (sbi->segs_per_sec > 1)
>>>> -				get_sec_entry(sbi, start)->valid_blocks +=
>>>> -							se->valid_blocks;
>>>> +			seg_info_from_raw_sit(&sit_i->sentries[start], &sit);
>>>>    		}
>>>>    		start_blk += readed;
>>>>    	} while (start_blk < sit_blk_cnt);
>>>>    
>>>>    	down_read(&curseg->journal_rwsem);
>>>>    	for (i = 0; i < sits_in_cursum(journal); i++) {
>>>> -		unsigned int old_valid_blocks;
>>>> -
>>>>    		start = le32_to_cpu(segno_in_journal(journal, i));
>>>> -		se = &sit_i->sentries[start];
>>>>    		sit = sit_in_journal(journal, i);
>>>>    
>>>> -		old_valid_blocks = se->valid_blocks;
>>>> -
>>>>    		check_block_count(sbi, start, &sit);
>>>> -		seg_info_from_raw_sit(se, &sit);
>>>> -
>>>> -		if (f2fs_discard_en(sbi)) {
>>>> -			if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
>>>> -				memset(se->discard_map, 0xff,
>>>> -							SIT_VBLOCK_MAP_SIZE);
>>>> -			} else {
>>>> -				memcpy(se->discard_map, se->cur_valid_map,
>>>> -							SIT_VBLOCK_MAP_SIZE);
>>>> -				sbi->discard_blks += old_valid_blocks -
>>>> -							se->valid_blocks;
>>>> -			}
>>>> -		}
>>>> -
>>>> -		if (sbi->segs_per_sec > 1)
>>>> -			get_sec_entry(sbi, start)->valid_blocks +=
>>>> -				se->valid_blocks - old_valid_blocks;
>>>> +		seg_info_from_raw_sit(&sit_i->sentries[start], &sit);
>>>>    	}
>>>>    	up_read(&curseg->journal_rwsem);
>>>> +
>>>> +	/* build discard map only one time */
>>>> +	if (f2fs_discard_en(sbi))
>>>> +		build_discard_entries(sbi);
>>>> +
>>>> +	if (sbi->segs_per_sec > 1)
>>>> +		build_sec_entries(sbi);
>>>>    }
>>>>    
>>>>    static void init_free_segmap(struct f2fs_sb_info *sbi)
>>>> -- 
>>>> 2.1.4




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux