RE: Condition execution optimization with gcc 7.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

Thanks for the answer, it is very clear to me now.

Again thanks a lot.
Best,
Benjamin

-----Original Message-----
From: Richard Earnshaw (lists) <Richard.Earnshaw@xxxxxxx> 
Sent: Monday, May 22, 2023 6:12 PM
To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; Kyrylo Tkachov <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx
Subject: Re: Condition execution optimization with gcc 7.5

On aarch64 this code cannot use conditional select.  An operation such as
	if (c) {
	  ...
	  r->lowcase_header[0] = c;
	  ...
	}

would be a conditional store to memory and can only happen if the guarding condition is true.  It's not safe to convert this into, say

	cmp c, #0
	...
	ldr w1, [ptr]
	csel w1, w1, c, eq
	str w1, [ptr]

because the store would introduce a possible race with any other thread that might be writing to the same location.  The compiler would also have to prove that ptr always contained a valid address when 'c' was false as well, something that might not be possible given the information available.

The function arm_max_conditional_execute is only used for 32-bit arm targets.  It's not part of the aarch64 compiler.

R.

On 22/05/2023 16:43, Benjamin Minguez via Gcc-help wrote:
> Hello Richard,
> 
> I'm compiling for aarch64. Indeed, I was expecting conversion via conditional move or set.
> I understand that code such as NGINX HTTP parser is suitable for such conversion. But I was expecting that, for example, this code can benefit of it (ngx_hash is an inline function and is a simple xor operation):
>>>                   if (c) {
>>>                       hash = ngx_hash(0, c);
>>>                       r->lowcase_header[0] = c;
>>>                       i = 1;
>>>                       break;
>>>                   }
> 
> Thank for your help and your answers.
> 
> Best,
> Benjamin Minguez
> 
> -----Original Message-----
> From: Richard Earnshaw (lists) <Richard.Earnshaw@xxxxxxx>
> Sent: Thursday, May 18, 2023 1:02 PM
> To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; Kyrylo Tkachov 
> <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx
> Subject: Re: Condition execution optimization with gcc 7.5
> 
> On 17/05/2023 09:17, Benjamin Minguez via Gcc-help wrote:
>> Hello,
>>
>> I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed.
> 
> Just to be certain, are you compiling for aarch32 (arm/thumb), or aarch64?  The latter does not support conditional execution, except via instructions such as CSEL.
> 
> [more comments lower down]
> 
>> In parallel, I also try  with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results.
>>
>>    Do you have any idea why the this optimization step is not called?
>>
>> Thank you in advance for your help.
>>
>> Best,
>> Benjamin Minguez
>>
>> -----Original Message-----
>> From: Benjamin Minguez
>> Sent: Wednesday, May 10, 2023 8:43 AM
>> To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx
>> Subject: RE: Condition execution optimization with gcc 7.5
>>
>> Hi,
>>
>> Thank for the answer.
>>
>> I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def:
>> 	DEFHOOK
>> 	(have_conditional_execution,
>> 	 "This target hook returns true if the target supports conditional execution.\n\
>> 	This target hook is required only when the target has several different\n\
>> 	modes and they have different conditional execution capability, such as ARM.",
>> 	 bool, (void),
>> 	 default_have_conditional_execution)
>> and find this one,  gcc-7.5.0/gcc/targhooks.c:
>> 	bool
>> 	default_have_conditional_execution (void)
>> 	{
>> 	  return HAVE_conditional_execution;
>> 	}
>> Finally, the macro HAVE_conditional_execution is defined here:
>> build-gcc/gcc/insn-config.h,
>>
>> I will investigate the -march or -mcpu option.
>>
>> Again, thanks a lot,
>>
>> Benjamin Minguez
>>
>> -----Original Message-----
>> From: Kyrylo Tkachov <Kyrylo.Tkachov@xxxxxxx>
>> Sent: Tuesday, May 9, 2023 11:50 AM
>> To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; 
>> gcc-help@xxxxxxxxxxx
>> Subject: RE: Condition execution optimization with gcc 7.5
>>
>> Hi Benjamin,
>>
>>> -----Original Message-----
>>> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@xxxxxxxxxxx>
>>> On Behalf Of Benjamin Minguez via Gcc-help
>>> Sent: Tuesday, May 9, 2023 8:54 AM
>>> To: gcc-help@xxxxxxxxxxx
>>> Subject: Condition execution optimization with gcc 7.5
>>>
>>> Hello everyone,
>>>
>>> I'm trying to optimize an application that contains a lot of branches.
>>> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.
>>
>> Of course GCC 7.5 is quite old now but if you're forced to use it...
>>
>>> As the original application is similar to NGINX, I investigated on 
>>> NGINX. I'm focusing on the HTTP header parsing. Basically, the 
>>> algorithm parse byte per byte and based on the value stores some variables.
>>> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
>>>                   if (c) {
>>>                       hash = ngx_hash(0, c);
>>>                       r->lowcase_header[0] = c;
>>>                       i = 1;
>>>                       break;
>>>                   }
>>>
>>>                   if (ch == '_') {
>>>                       if (allow_underscores) {
>>>                           hash = ngx_hash(0, ch);
>>>                           r->lowcase_header[0] = ch;
>>>                           i = 1;
>>>
>>>                       } else {
>>>                           r->invalid_header = 1;
>>>                       }
>>>
>>>                       break;
>>>                   }
> 
> Your example code isn't complete enough to do a full analysis, but I doubt code like this would generate conditional execution anyway.  There are several reasons:
> 
> 1) It's likely too long once machine instructions are generated
> 2) There are function calls (ngx_hash) in the body of the conditional blocks (calls cannot be conditionally executed); if they are inlined then see 1) above.
> 3) you have nested conditions (only the innermost block could be conditionally executed).
> 4) you wouldn't want to conditionally execute 'if (allow_underscores)'
> anyway as it's probably highly predictable as a branch.
> 
> R.
> 
>>> Also, most of branches are not predictable because it compares 
>>> against data coming from the network.
>>>   From these observations, I looked at the conditional execution 
>>> optimization step in GCC and I found this function that should do the work:
>>> cond_exec_find_if_block. And how to customize the decision to use 
>>> conditional instructions:
>>
>> ... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
>> AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
>> If you're indeed looking at arm...
>>
>>>                   #define MAX_CONDITIONAL_EXECUTE 
>>> arm_max_conditional_execute ()
>>>                   int
>>>                   arm_max_conditional_execute (void)
>>>                   {
>>>                     return max_insns_skipped;
>>>                   }
>>>                   static int max_insns_skipped = 5;
>>>
>>> I tried to compile NGNIX in -O2 (that should enable if-conversion2) 
>>> but I did not noticed any change in the code. I enable GCC debug 
>>> (-da) and also add some debug in this function and I figure out that 
>>> targetm.have_conditional_execution is set to false.
>>>
>>> First, do you how to switch this variable to true. I guess it is an 
>>> option during the configuration step of GCC.
>>
>> It's definition on that branch is:
>> /* Only thumb1 can't support conditional execution, so return true if
>>      the target is not thumb1.  */
>> static bool
>> arm_have_conditional_execution (void) {
>>     return !TARGET_THUMB1;
>> }
>>
>> So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?
>>
>> Thanks,
>> Kyrill
>>
>>> Then, I know  that the decision to use conditional execution is 
>>> based on the extra cost added to compute both branches compare to the cost of a branch.
>>> In this specific case, branches are miss predicted and the cost is, indeed, high.
>>> Do you think that increasing the max_insns_skipped will be enough to 
>>> help GCC to use conditional execution?
>>>
>>> Thank you in advance for your answers.
>>>
>>> Best,
>>> Benjamin Minguez
> 
> R.
> 
> 






[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux