Re: 回复： [AArch64][Spec2017]Question about mlow-precision-div optimization.

Richard Sandiford <richard.sandiford@xxxxxxx> · Fri, 06 Mar 2020 15:23:58 +0000

Hi,

Sorry for the slow reply, the last few days have been a bit hectic.

"Bu Le" <cityubule@xxxxxx> writes:
>>It's probably not worth promoting to a full -m option that in theory
>>would be supported for evermore.  But now that targets can define their
>>own --params, it might make sense to use --params here.
> Thanks for the reply.
> I tried the patch in the attachment, it works as we expected. Do you mean like
> this?
>
> A simple example :
> Double foo(double a, double b) { return a /b;}
>
> -O2 -ffast-math -mlow-precision-div foo.c will give:
>   Foo:
>  frecpe d2, d1
>  frecps d3, d2, d1
>  fmul d2, d2, d3
>  frecps d3, d2, d1
>  fmul d2, d2, d0
>  fmul d0, d2, d3
>  ret
> -O2 -ffast-math -mlow-precision-div --param=aarch64-double-recp-precision=2
> foo.c result in one less step
>  Foo:
>  frecpe d2, d1
>  frecps d3, d2, d1
>  fmul d2, d2, d0
>  fmul d0, d2, d3
>  ret

Yeah, this is the kind of thing I had in mind.  However, rather than
calculating the value here:

> diff -Nurp a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> --- a/gcc/config/aarch64/aarch64.c	2020-02-11 11:51:04.000000000 +0800
> +++ b/gcc/config/aarch64/aarch64.c	2020-03-04 23:01:16.600403598 +0800
> @@ -12851,8 +12851,8 @@ aarch64_emit_approx_div (rtx quo, rtx nu
>    rtx xrcp = gen_reg_rtx (mode);
>    emit_insn (gen_aarch64_frecpe (mode, xrcp, den));
>  
> -  /* Iterate over the series twice for SF and thrice for DF.  */
> -  int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
> +  /* Iterate over the series twice for SF and thrice for DF by default.  */
> +  int iterations = (GET_MODE_INNER (mode) == DFmode) ? aarch64_double_recp_precision : aarch64_float_recp_precision;

and then decrementing it here:

>    /* Optionally iterate over the series once less for faster performance,
>       while sacrificing the accuracy.  */
>    if ((recp && flag_mrecip_low_precision_sqrt)
>        || (!recp && flag_mlow_precision_sqrt))
>      iterations--;

it might better to keep the original 3 : 2 calculation above and
then override it with the param values:

    if ((recp && flag_mrecip_low_precision_sqrt)
        || (!recp && flag_mlow_precision_sqrt))
      iterations = ...param values...;

That way, the --param value reflects the actual number of steps.

Minor formatting point, but GCC code uses a maximum line length
of 80 characters, so the conventional way of formatting the
calculation above would be:

  int iterations = (GET_MODE_INNER (mode) == DFmode
		    ? aarch64_double_recp_precision
		    : aarch64_float_recp_precision);

> diff -Nurp a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> --- a/gcc/config/aarch64/aarch64.opt	2020-02-04 09:23:30.000000000 +0800
> +++ b/gcc/config/aarch64/aarch64.opt	2020-03-04 23:01:18.173777158 +0800
> @@ -262,3 +262,12 @@ Generate local calls to out-of-line atom
>  -param=aarch64-sve-compare-costs=
>  Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param
>  When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach.  Also use the cost model to choose between SVE and Advanced SIMD vectorization.
> +
> +-param=aarch64-float-recp-precision=
> +Target Joined UInteger Var(aarch64_float_recp_precision) Init(2) IntegerRange(1, 5) Param
> +The number of Newton-iteration for calculating the reciprocal for float type. The precision of division is propotional to this param when division approximation is enabled. The default value is 2.
> +
> +-param=aarch64-double-recp-precision=
> +Target Joined UInteger Var(aarch64_double_recp_precision) Init(3) IntegerRange(1, 5) Param
> +The number of Newton-iteration for calculating the reciprocal for double type. The precision of division is propotional to this param when division approximation is enabled. The default value is 3.
> +

typo: s/propotional/proportional/.  Also, maybe
s/Newton-iteration/Newton iterations/

Looks good otherwise.  However, the patch is unfortunately big enough to
need a copyright assignment to the FSF.  Do you already have one on file
(either a personal or a corporate one, depending on your circumstances)?
If not, would you be willing to sign one?  I can send you the forms
off-list if so.

Thanks,
Richard