Re: [PATCH RESEND 2/2] x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 13, 2025 at 9:48 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 2/13/25 11:14, Uros Bizjak wrote:
> > According to [1], the usage of asm pseudo directives in the asm template
> > can confuse the compiler to wrongly estimate the size of the generated
> > code. ALTERNATIVE macro expands to several asm pseudo directives, so
> > its usage in {,try_}cmpxchg{64,128} causes instruction length estimate
> > to fail by an order of magnitude (the compiler estimates the length of
> > an asm to be more than 20 instructions).
>
> Just curious, but how did you come up with the "20 instructions" number?

Currently, a patched GCC compiler is needed (please see
asm_insn_count() and asm_str_count() functions in gcc/final.cc on how
the asm length is calculated) to report the length. For historic
reasons, the length of asm is not printed in asm dumps, but recently a
GCC PR was filled with a request to change this).

> > This wrong estimate further causes unoptimal inlining decisions for
> > functions that use these locking primitives.
> >
> > Use asm_inline instead of just asm. For inlining purposes, the size of
> > the asm is then taken as the minimum size, ignoring how many instructions
> > compiler thinks it is.
>
> So, the compiler is trying to decide whether to inline a function or
> not. The bigger it is, the less likely, it is to be inlined. Since it is
> over-estimating the size of {,try_}cmpxchg{64,128}, it will avoid
> inlining it when it _should_ be inlining it.
>
> Is that it?

Yes, because the calculated length of what is effectively one
instruction gets unreasonably high. The compiler counts 20
*instructions*, each estimated to be 16 bytes long.

> Is any of this measurable? Is there any objective data to support that
> this change is a good one?

Actually, "asm inline" was added to the GCC compiler just for this
purpose by request from the linux community [1]. My patch follows the
example of other similar macros (e.g. arch/x86/include/alternative.h)
and adds the same cure to asms that will undoubtedly result in a
single instruction [*].  The benefit is much more precise length
estimation, so compiler heuristic is able to correctly estimate the
benefit of inlining, not being skewed by excessive use of
__always_inline directive. OTOH, it is hard to back up compiler
decisions by objective data, as inlining decisions depend on several
factors besides function size (e.g. how hot/cold is function), so a
simple comparison of kernel sizes does not present the full picture.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2018-December/512349.html

[*] Please note that if asm template is using CC_SET, the compiler may
emit an additional SETx asm insn. However, all GCC versions that
support "asm inline" also support flag outputs, so they are guaranteed
to emit only one asm insn.

> It's quite possible that someone did the "asm" on purpose because
> over-estimating the size was a good thing.

I doubt this would be the case, and I would consider the code that
depends on this detail defective. The code that results in one asm
instruction should be accounted as such, no matter what internal
details are exposed in the instruction asm template.

Uros.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux