Hello Richard, I'm compiling for aarch64. Indeed, I was expecting conversion via conditional move or set. I understand that code such as NGINX HTTP parser is suitable for such conversion. But I was expecting that, for example, this code can benefit of it (ngx_hash is an inline function and is a simple xor operation): >> if (c) { >> hash = ngx_hash(0, c); >> r->lowcase_header[0] = c; >> i = 1; >> break; >> } Thank for your help and your answers. Best, Benjamin Minguez -----Original Message----- From: Richard Earnshaw (lists) <Richard.Earnshaw@xxxxxxx> Sent: Thursday, May 18, 2023 1:02 PM To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; Kyrylo Tkachov <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx Subject: Re: Condition execution optimization with gcc 7.5 On 17/05/2023 09:17, Benjamin Minguez via Gcc-help wrote: > Hello, > > I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed. Just to be certain, are you compiling for aarch32 (arm/thumb), or aarch64? The latter does not support conditional execution, except via instructions such as CSEL. [more comments lower down] > In parallel, I also try with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results. > > Do you have any idea why the this optimization step is not called? > > Thank you in advance for your help. > > Best, > Benjamin Minguez > > -----Original Message----- > From: Benjamin Minguez > Sent: Wednesday, May 10, 2023 8:43 AM > To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx > Subject: RE: Condition execution optimization with gcc 7.5 > > Hi, > > Thank for the answer. > > I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def: > DEFHOOK > (have_conditional_execution, > "This target hook returns true if the target supports conditional execution.\n\ > This target hook is required only when the target has several different\n\ > modes and they have different conditional execution capability, such as ARM.", > bool, (void), > default_have_conditional_execution) > and find this one, gcc-7.5.0/gcc/targhooks.c: > bool > default_have_conditional_execution (void) > { > return HAVE_conditional_execution; > } > Finally, the macro HAVE_conditional_execution is defined here: > build-gcc/gcc/insn-config.h, > > I will investigate the -march or -mcpu option. > > Again, thanks a lot, > > Benjamin Minguez > > -----Original Message----- > From: Kyrylo Tkachov <Kyrylo.Tkachov@xxxxxxx> > Sent: Tuesday, May 9, 2023 11:50 AM > To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; > gcc-help@xxxxxxxxxxx > Subject: RE: Condition execution optimization with gcc 7.5 > > Hi Benjamin, > >> -----Original Message----- >> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@xxxxxxxxxxx> >> On Behalf Of Benjamin Minguez via Gcc-help >> Sent: Tuesday, May 9, 2023 8:54 AM >> To: gcc-help@xxxxxxxxxxx >> Subject: Condition execution optimization with gcc 7.5 >> >> Hello everyone, >> >> I'm trying to optimize an application that contains a lot of branches. >> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason. > > Of course GCC 7.5 is quite old now but if you're forced to use it... > >> As the original application is similar to NGINX, I investigated on >> NGINX. I'm focusing on the HTTP header parsing. Basically, the >> algorithm parse byte per byte and based on the value stores some variables. >> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line >> if (c) { >> hash = ngx_hash(0, c); >> r->lowcase_header[0] = c; >> i = 1; >> break; >> } >> >> if (ch == '_') { >> if (allow_underscores) { >> hash = ngx_hash(0, ch); >> r->lowcase_header[0] = ch; >> i = 1; >> >> } else { >> r->invalid_header = 1; >> } >> >> break; >> } Your example code isn't complete enough to do a full analysis, but I doubt code like this would generate conditional execution anyway. There are several reasons: 1) It's likely too long once machine instructions are generated 2) There are function calls (ngx_hash) in the body of the conditional blocks (calls cannot be conditionally executed); if they are inlined then see 1) above. 3) you have nested conditions (only the innermost block could be conditionally executed). 4) you wouldn't want to conditionally execute 'if (allow_underscores)' anyway as it's probably highly predictable as a branch. R. >> Also, most of branches are not predictable because it compares against >> data coming from the network. >> From these observations, I looked at the conditional execution >> optimization step in GCC and I found this function that should do the work: >> cond_exec_find_if_block. And how to customize the decision to use >> conditional instructions: > > ... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting? > AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it. > If you're indeed looking at arm... > >> #define MAX_CONDITIONAL_EXECUTE >> arm_max_conditional_execute () >> int >> arm_max_conditional_execute (void) >> { >> return max_insns_skipped; >> } >> static int max_insns_skipped = 5; >> >> I tried to compile NGNIX in -O2 (that should enable if-conversion2) >> but I did not noticed any change in the code. I enable GCC debug (-da) >> and also add some debug in this function and I figure out that >> targetm.have_conditional_execution is set to false. >> >> First, do you how to switch this variable to true. I guess it is an >> option during the configuration step of GCC. > > It's definition on that branch is: > /* Only thumb1 can't support conditional execution, so return true if > the target is not thumb1. */ > static bool > arm_have_conditional_execution (void) > { > return !TARGET_THUMB1; > } > > So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features? > > Thanks, > Kyrill > >> Then, I know that the decision to use conditional execution is based >> on the extra cost added to compute both branches compare to the cost of a branch. >> In this specific case, branches are miss predicted and the cost is, indeed, high. >> Do you think that increasing the max_insns_skipped will be enough to >> help GCC to use conditional execution? >> >> Thank you in advance for your answers. >> >> Best, >> Benjamin Minguez R.