Re: [PATCH RESEND 2/2] x86/locking: Use asm_inline for {,try_}cmpxchg{64,128} emulations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/13/25 14:13, Uros Bizjak wrote:
> On Thu, Feb 13, 2025 at 9:48 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>> On 2/13/25 11:14, Uros Bizjak wrote:
>>> According to [1], the usage of asm pseudo directives in the asm template
>>> can confuse the compiler to wrongly estimate the size of the generated
>>> code. ALTERNATIVE macro expands to several asm pseudo directives, so
>>> its usage in {,try_}cmpxchg{64,128} causes instruction length estimate
>>> to fail by an order of magnitude (the compiler estimates the length of
>>> an asm to be more than 20 instructions).
>>
>> Just curious, but how did you come up with the "20 instructions" number?
> 
> Currently, a patched GCC compiler is needed (please see
> asm_insn_count() and asm_str_count() functions in gcc/final.cc on how
> the asm length is calculated) to report the length. For historic
> reasons, the length of asm is not printed in asm dumps, but recently a
> GCC PR was filled with a request to change this).

So, that's also good info to add. You can  even do it in the changelog
with little more space than the existing changelog:

	... fail by an order of magnitude (a hacked-up gcc shows that it
	estimates the length of an asm to be more than 20 instructions).

...
>> Is any of this measurable? Is there any objective data to support that
>> this change is a good one?
> 
> Actually, "asm inline" was added to the GCC compiler just for this
> purpose by request from the linux community [1].

Wow, that's really important important information. Shouldn't the fact
that this is leveraging a new feature that we asked for specifically get
called out somewhere?

Who asked for it? Are they on cc? Do they agree that this feature fills
the gap they wanted filled?

> My patch follows the
> example of other similar macros (e.g. arch/x86/include/alternative.h)
> and adds the same cure to asms that will undoubtedly result in a
> single instruction [*].  The benefit is much more precise length
> estimation, so compiler heuristic is able to correctly estimate the
> benefit of inlining, not being skewed by excessive use of
> __always_inline directive. OTOH, it is hard to back up compiler
> decisions by objective data, as inlining decisions depend on several
> factors besides function size (e.g. how hot/cold is function), so a
> simple comparison of kernel sizes does not present the full picture.

Yes, the world is complicated. But, honestly, one data point is a
billion times better than zero. Right now, we're at zero.

>> It's quite possible that someone did the "asm" on purpose because
>> over-estimating the size was a good thing.
> 
> I doubt this would be the case, and I would consider the code that
> depends on this detail defective. The code that results in one asm
> instruction should be accounted as such, no matter what internal
> details are exposed in the instruction asm template.

Yeah, but defective or not, if this causes a regression, it's either not
getting applied to gets reverted.

All that I'm asking here is that someone look at the kernel after the
patch gets applied and sanity check it. Absolutely basic scientific
method stuff. Make a hypothesis about what it will do:

	1. Inline these locking functions
	2. Make the kernel go faster for _something_

and if it doesn't match the hypothesis, then try and figure out why. You
don't have to do every config or every compiler. Just do one config and
one modern compiler.

Right now, this patch is saying:

	1. gcc appears to have done something that might be suboptimal
	2. gcc has a new feature that might make it less suboptimal
	3. here's a patch that should optimize things
	...

but then it leaves us hanging.  There's a lot of "mights" and "shoulds"
in there, but nothing that shows that this actually does anything
positive in practice.

Maybe I'm just a dummy and this is just an obvious improvement that I
can't grasp. If so, sorry for being so dense, but I'm going to need a
little more education before this gets applied.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux