Re: Condition execution optimization with gcc 7.5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On aarch64 this code cannot use conditional select.  An operation such as
	if (c) {
	  ...
	  r->lowcase_header[0] = c;
	  ...
	}

would be a conditional store to memory and can only happen if the guarding condition is true. It's not safe to convert this into, say

	cmp c, #0
	...
	ldr w1, [ptr]
	csel w1, w1, c, eq
	str w1, [ptr]

because the store would introduce a possible race with any other thread that might be writing to the same location. The compiler would also have to prove that ptr always contained a valid address when 'c' was false as well, something that might not be possible given the information available.

The function arm_max_conditional_execute is only used for 32-bit arm targets. It's not part of the aarch64 compiler.

R.

On 22/05/2023 16:43, Benjamin Minguez via Gcc-help wrote:
Hello Richard,

I'm compiling for aarch64. Indeed, I was expecting conversion via conditional move or set.
I understand that code such as NGINX HTTP parser is suitable for such conversion. But I was expecting that, for example, this code can benefit of it (ngx_hash is an inline function and is a simple xor operation):
                  if (c) {
                      hash = ngx_hash(0, c);
                      r->lowcase_header[0] = c;
                      i = 1;
                      break;
                  }

Thank for your help and your answers.

Best,
Benjamin Minguez

-----Original Message-----
From: Richard Earnshaw (lists) <Richard.Earnshaw@xxxxxxx>
Sent: Thursday, May 18, 2023 1:02 PM
To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>; Kyrylo Tkachov <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx
Subject: Re: Condition execution optimization with gcc 7.5

On 17/05/2023 09:17, Benjamin Minguez via Gcc-help wrote:
Hello,

I did add -march=armv8-a (and the others armv8.*-a) to GCC command line, but it looks like the conditional execution optimization, cond_exec_find_if_block function, is never called. I enabled all gcc dumps (-da option) and this function debug message are never printed.

Just to be certain, are you compiling for aarch32 (arm/thumb), or aarch64?  The latter does not support conditional execution, except via instructions such as CSEL.

[more comments lower down]

In parallel, I also try  with different version of GCC: 9.5.0 and 11.3.0, and again the I had the same results.

   Do you have any idea why the this optimization step is not called?

Thank you in advance for your help.

Best,
Benjamin Minguez

-----Original Message-----
From: Benjamin Minguez
Sent: Wednesday, May 10, 2023 8:43 AM
To: 'Kyrylo Tkachov' <Kyrylo.Tkachov@xxxxxxx>; gcc-help@xxxxxxxxxxx
Subject: RE: Condition execution optimization with gcc 7.5

Hi,

Thank for the answer.

I had a look at the wrong function definition, gcc-7.5.0/gcc/target.def:
	DEFHOOK
	(have_conditional_execution,
	 "This target hook returns true if the target supports conditional execution.\n\
	This target hook is required only when the target has several different\n\
	modes and they have different conditional execution capability, such as ARM.",
	 bool, (void),
	 default_have_conditional_execution)
and find this one,  gcc-7.5.0/gcc/targhooks.c:
	bool
	default_have_conditional_execution (void)
	{
	  return HAVE_conditional_execution;
	}
Finally, the macro HAVE_conditional_execution is defined here:
build-gcc/gcc/insn-config.h,

I will investigate the -march or -mcpu option.

Again, thanks a lot,

Benjamin Minguez

-----Original Message-----
From: Kyrylo Tkachov <Kyrylo.Tkachov@xxxxxxx>
Sent: Tuesday, May 9, 2023 11:50 AM
To: Benjamin Minguez <benjamin.minguez@xxxxxxxxxx>;
gcc-help@xxxxxxxxxxx
Subject: RE: Condition execution optimization with gcc 7.5

Hi Benjamin,

-----Original Message-----
From: Gcc-help <gcc-help-bounces+kyrylo.tkachov=arm.com@xxxxxxxxxxx>
On Behalf Of Benjamin Minguez via Gcc-help
Sent: Tuesday, May 9, 2023 8:54 AM
To: gcc-help@xxxxxxxxxxx
Subject: Condition execution optimization with gcc 7.5

Hello everyone,

I'm trying to optimize an application that contains a lot of branches.
I'm targeting armv8 processors and I'm using GCC 7.5.0 for compatibility reason.

Of course GCC 7.5 is quite old now but if you're forced to use it...

As the original application is similar to NGINX, I investigated on
NGINX. I'm focusing on the HTTP header parsing. Basically, the
algorithm parse byte per byte and based on the value stores some variables.
Here is an example, /src/http/ngx_http_parse.c: ngx_http_parse_header_line
                  if (c) {
                      hash = ngx_hash(0, c);
                      r->lowcase_header[0] = c;
                      i = 1;
                      break;
                  }

                  if (ch == '_') {
                      if (allow_underscores) {
                          hash = ngx_hash(0, ch);
                          r->lowcase_header[0] = ch;
                          i = 1;

                      } else {
                          r->invalid_header = 1;
                      }

                      break;
                  }

Your example code isn't complete enough to do a full analysis, but I doubt code like this would generate conditional execution anyway.  There are several reasons:

1) It's likely too long once machine instructions are generated
2) There are function calls (ngx_hash) in the body of the conditional blocks (calls cannot be conditionally executed); if they are inlined then see 1) above.
3) you have nested conditions (only the innermost block could be conditionally executed).
4) you wouldn't want to conditionally execute 'if (allow_underscores)'
anyway as it's probably highly predictable as a branch.

R.

Also, most of branches are not predictable because it compares against
data coming from the network.
  From these observations, I looked at the conditional execution
optimization step in GCC and I found this function that should do the work:
cond_exec_find_if_block. And how to customize the decision to use
conditional instructions:

... This relates to the arm port i.e. the 32-bit target in Armv8-a, is that what you're targeting?
AArch64 has had more tuning work put into it over the years so may do better performance-wise if your processor and environment supports it.
If you're indeed looking at arm...

                  #define MAX_CONDITIONAL_EXECUTE
arm_max_conditional_execute ()
                  int
                  arm_max_conditional_execute (void)
                  {
                    return max_insns_skipped;
                  }
                  static int max_insns_skipped = 5;

I tried to compile NGNIX in -O2 (that should enable if-conversion2)
but I did not noticed any change in the code. I enable GCC debug (-da)
and also add some debug in this function and I figure out that
targetm.have_conditional_execution is set to false.

First, do you how to switch this variable to true. I guess it is an
option during the configuration step of GCC.

It's definition on that branch is:
/* Only thumb1 can't support conditional execution, so return true if
     the target is not thumb1.  */
static bool
arm_have_conditional_execution (void)
{
    return !TARGET_THUMB1;
}

So it looks like you're maybe not setting the right -march or -mcpu option to enable the full armv8-a features?

Thanks,
Kyrill

Then, I know  that the decision to use conditional execution is based
on the extra cost added to compute both branches compare to the cost of a branch.
In this specific case, branches are miss predicted and the cost is, indeed, high.
Do you think that increasing the max_insns_skipped will be enough to
help GCC to use conditional execution?

Thank you in advance for your answers.

Best,
Benjamin Minguez

R.






[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux