On Thu, Feb 13, 2025 at 11:52 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 2/13/25 14:13, Uros Bizjak wrote: > > On Thu, Feb 13, 2025 at 9:48 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > >> On 2/13/25 11:14, Uros Bizjak wrote: > >>> According to [1], the usage of asm pseudo directives in the asm template > >>> can confuse the compiler to wrongly estimate the size of the generated > >>> code. ALTERNATIVE macro expands to several asm pseudo directives, so > >>> its usage in {,try_}cmpxchg{64,128} causes instruction length estimate > >>> to fail by an order of magnitude (the compiler estimates the length of > >>> an asm to be more than 20 instructions). > >> > >> Just curious, but how did you come up with the "20 instructions" number? > > > > Currently, a patched GCC compiler is needed (please see > > asm_insn_count() and asm_str_count() functions in gcc/final.cc on how > > the asm length is calculated) to report the length. For historic > > reasons, the length of asm is not printed in asm dumps, but recently a > > GCC PR was filled with a request to change this). > > So, that's also good info to add. You can even do it in the changelog > with little more space than the existing changelog: > > ... fail by an order of magnitude (a hacked-up gcc shows that it > estimates the length of an asm to be more than 20 instructions). > > ... > >> Is any of this measurable? Is there any objective data to support that > >> this change is a good one? > > > > Actually, "asm inline" was added to the GCC compiler just for this > > purpose by request from the linux community [1]. > > Wow, that's really important important information. Shouldn't the fact > that this is leveraging a new feature that we asked for specifically get > called out somewhere? > > Who asked for it? Are they on cc? Do they agree that this feature fills > the gap they wanted filled? asm_inline is already used in some 40-50 places throughout the tree, but there still remain some places that could benefit from it. > > My patch follows the > > example of other similar macros (e.g. arch/x86/include/alternative.h) > > and adds the same cure to asms that will undoubtedly result in a > > single instruction [*]. The benefit is much more precise length > > estimation, so compiler heuristic is able to correctly estimate the > > benefit of inlining, not being skewed by excessive use of > > __always_inline directive. OTOH, it is hard to back up compiler > > decisions by objective data, as inlining decisions depend on several > > factors besides function size (e.g. how hot/cold is function), so a > > simple comparison of kernel sizes does not present the full picture. > > Yes, the world is complicated. But, honestly, one data point is a > billion times better than zero. Right now, we're at zero. > > >> It's quite possible that someone did the "asm" on purpose because > >> over-estimating the size was a good thing. > > > > I doubt this would be the case, and I would consider the code that > > depends on this detail defective. The code that results in one asm > > instruction should be accounted as such, no matter what internal > > details are exposed in the instruction asm template. > > Yeah, but defective or not, if this causes a regression, it's either not > getting applied to gets reverted. > > All that I'm asking here is that someone look at the kernel after the > patch gets applied and sanity check it. Absolutely basic scientific > method stuff. Make a hypothesis about what it will do: > > 1. Inline these locking functions > 2. Make the kernel go faster for _something_ > > and if it doesn't match the hypothesis, then try and figure out why. You > don't have to do every config or every compiler. Just do one config and > one modern compiler. > > Right now, this patch is saying: > > 1. gcc appears to have done something that might be suboptimal > 2. gcc has a new feature that might make it less suboptimal > 3. here's a patch that should optimize things > ... > > but then it leaves us hanging. There's a lot of "mights" and "shoulds" > in there, but nothing that shows that this actually does anything > positive in practice. Let me harvest some data and report the findings in a V2 ChangeLog. However, these particular macros are rarely used, so I don't expect some big changes in the generated asm code. > Maybe I'm just a dummy and this is just an obvious improvement that I > can't grasp. If so, sorry for being so dense, but I'm going to need a > little more education before this gets applied. Thanks, Uros.