Hello, Thanks for the answer, it is very clear to me now. Again thanks a lot. Best, Benjamin -----Original Message----- From: Richard Earnshaw (lists) <Richard.Earnshaw@xxxxxxx> Sent: Monday, May 22, 2023 6:12 PM To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; Kyrylo Tkachov <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx Subject: Re: Condition execution optimization with gcc 7.5 On aarch64 this code cannot use conditional select. An operation such as if (c) { ... r->lowcase_header[0] = c; ... } would be a conditional store to memory and can only happen if the guarding condition is true. It's not safe to convert this into, say cmp c, #0 ... ldr w1, [ptr] csel w1, w1, c, eq str w1, [ptr] because the store would introduce a possible race with any other thread that might be writing to the same location. The compiler would also have to prove that ptr always contained a valid address when 'c' was false as well, something that might not be possible given the information available. The function arm_max_conditional_execute is only used for 32-bit arm targets. It's not part of the aarch64 compiler. R. On 22/05/2023 16:43, Benjamin Minguez via Gcc-help wrote: > Hello Richard, > > I'm compiling for aarch64. Indeed, I was expecting conversion via conditional move or set. > I understand that code such as NGINX HTTP parser is suitable for such conversion. But I was expecting that, for example, this code can benefit of it (ngx_hash is an inline function and is a simple xor operation): >>> if (c) { >>> hash = ngx_hash(0, c); >>> r->lowcase_header[0] = c; >>> i = 1; >>> break; >>> } > > Thank for your help and your answers. > > Best, > Benjamin Minguez > > -----Original Message----- > From: Richard Earnshaw (lists) <Richard.Earnshaw@xxxxxxx> > Sent: Thursday, May 18, 2023 1:02 PM > To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; Kyrylo Tkachov > <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx > Subject: Re: Condition execution optimization with gcc 7.5 > > On 17/05/2023 09:17, Benjamin Minguez via Gcc-help wrote: >> Hello, >> >> I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed. > > Just to be certain, are you compiling for aarch32 (arm/thumb), or aarch64? The latter does not support conditional execution, except via instructions such as CSEL. > > [more comments lower down] > >> In parallel, I also try with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results. >> >> Do you have any idea why the this optimization step is not called? >> >> Thank you in advance for your help. >> >> Best, >> Benjamin Minguez >> >> -----Original Message----- >> From: Benjamin Minguez >> Sent: Wednesday, May 10, 2023 8:43 AM >> To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx >> Subject: RE: Condition execution optimization with gcc 7.5 >> >> Hi, >> >> Thank for the answer. >> >> I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def: >> DEFHOOK >> (have_conditional_execution, >> "This target hook returns true if the target supports conditional execution.\n\ >> This target hook is required only when the target has several different\n\ >> modes and they have different conditional execution capability, such as ARM.", >> bool, (void), >> default_have_conditional_execution) >> and find this one, gcc-7.5.0/gcc/targhooks.c: >> bool >> default_have_conditional_execution (void) >> { >> return HAVE_conditional_execution; >> } >> Finally, the macro HAVE_conditional_execution is defined here: >> build-gcc/gcc/insn-config.h, >> >> I will investigate the -march or -mcpu option. >> >> Again, thanks a lot, >> >> Benjamin Minguez >> >> -----Original Message----- >> From: Kyrylo Tkachov <Kyrylo.Tkachov@xxxxxxx> >> Sent: Tuesday, May 9, 2023 11:50 AM >> To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; >> gcc-help@xxxxxxxxxxx >> Subject: RE: Condition execution optimization with gcc 7.5 >> >> Hi Benjamin, >> >>> -----Original Message----- >>> From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@xxxxxxxxxxx> >>> On Behalf Of Benjamin Minguez via Gcc-help >>> Sent: Tuesday, May 9, 2023 8:54 AM >>> To: gcc-help@xxxxxxxxxxx >>> Subject: Condition execution optimization with gcc 7.5 >>> >>> Hello everyone, >>> >>> I'm trying to optimize an application that contains a lot of branches. >>> I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason. >> >> Of course GCC 7.5 is quite old now but if you're forced to use it... >> >>> As the original application is similar to NGINX, I investigated on >>> NGINX. I'm focusing on the HTTP header parsing. Basically, the >>> algorithm parse byte per byte and based on the value stores some variables. >>> Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line >>> if (c) { >>> hash = ngx_hash(0, c); >>> r->lowcase_header[0] = c; >>> i = 1; >>> break; >>> } >>> >>> if (ch == '_') { >>> if (allow_underscores) { >>> hash = ngx_hash(0, ch); >>> r->lowcase_header[0] = ch; >>> i = 1; >>> >>> } else { >>> r->invalid_header = 1; >>> } >>> >>> break; >>> } > > Your example code isn't complete enough to do a full analysis, but I doubt code like this would generate conditional execution anyway. There are several reasons: > > 1) It's likely too long once machine instructions are generated > 2) There are function calls (ngx_hash) in the body of the conditional blocks (calls cannot be conditionally executed); if they are inlined then see 1) above. > 3) you have nested conditions (only the innermost block could be conditionally executed). > 4) you wouldn't want to conditionally execute 'if (allow_underscores)' > anyway as it's probably highly predictable as a branch. > > R. > >>> Also, most of branches are not predictable because it compares >>> against data coming from the network. >>> From these observations, I looked at the conditional execution >>> optimization step in GCC and I found this function that should do the work: >>> cond_exec_find_if_block. And how to customize the decision to use >>> conditional instructions: >> >> ... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting? >> AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it. >> If you're indeed looking at arm... >> >>> #define MAX_CONDITIONAL_EXECUTE >>> arm_max_conditional_execute () >>> int >>> arm_max_conditional_execute (void) >>> { >>> return max_insns_skipped; >>> } >>> static int max_insns_skipped = 5; >>> >>> I tried to compile NGNIX in -O2 (that should enable if-conversion2) >>> but I did not noticed any change in the code. I enable GCC debug >>> (-da) and also add some debug in this function and I figure out that >>> targetm.have_conditional_execution is set to false. >>> >>> First, do you how to switch this variable to true. I guess it is an >>> option during the configuration step of GCC. >> >> It's definition on that branch is: >> /* Only thumb1 can't support conditional execution, so return true if >> the target is not thumb1. */ >> static bool >> arm_have_conditional_execution (void) { >> return !TARGET_THUMB1; >> } >> >> So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features? >> >> Thanks, >> Kyrill >> >>> Then, I know that the decision to use conditional execution is >>> based on the extra cost added to compute both branches compare to the cost of a branch. >>> In this specific case, branches are miss predicted and the cost is, indeed, high. >>> Do you think that increasing the max_insns_skipped will be enough to >>> help GCC to use conditional execution? >>> >>> Thank you in advance for your answers. >>> >>> Best, >>> Benjamin Minguez > > R. > >