Re: Horrendous "runtime constant" hack - current patch x86-64 only

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On June 5, 2024 7:14:13 PM PDT, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>Ok, this attached patch is so absolutely disgusting that it is almost
>a work of art.
>
>I spent some time last week doing arm64 profiles, and on the loads I
>tested, I saw my old enemy __d_lookup_rcu(). The hash table lookup
>ends up being expensive. Not a huge surprise.
>
>That said, the expense of the hash table lookup is only partially the
>memory accesses of the hashtable itself.  A noticeable part of the
>cost is in looking up the address of the hash table.
>
>That annoys me. It has annoyed me before. It's a "runtime constant".
>In fact, it's two runtime constants: the address of the hash table,
>and the shift count that turns the dentry name hash into the index
>into the hash table (approximates a mask).
>
>It's disgusting having the profile point to the "load constant from memory".
>
>Peter Anvin at some point had some rather complex patch to do
>"constant alternatives". I couldn't find it, but I didn't search very
>hard because I remembered it being pretty significant in size, and I
>went "how hard can it be".
>
>Now, I did the profiling on arm64, but then when it came to rewriting
>instructions I went back to x86-64 just because while I'm trying to
>get better at reading arm64 asm, I don't want to deal with the pain of
>huge constants (and a very slow boot for testing).
>
>I'm posting this disgusting patch here because I need to take a break
>from this insanity, and maybe somebody else is interested.
>
>And yes, this needs to be behind some "CONFIG_RUNTIME_CONSTANTS"
>config variable, with fallback to the same old code.
>
>And yes, that static_shift_right_32() thing is odd. It takes and
>returns an 'unsigned long', but then operates on the low 32 bits of
>it, and clears the upper 32 bits (on 64-bit architectures). That's
>purely because this is what x86-64 code generation wants to turn that
>whole op into just a single instruction.
>
>The static_const_init() sizes are also hardcoded, "knowing" what the layout is.
>
>So this is all just a truly disgusting tech demo, but it generates
>very pretty code in d_lookup_rcu().
>
>Tested in the sense that it works for me in one particular
>configuration using clang. The code from gcc looks fine to me too, but
>that's from just quick "let's check".
>
>Actually extending this to arm64 (and possibly other architectures)
>would need some more cleanups and abstracting this all more. I didn't
>look if other core kernel code might want to use this, I was literally
>just concentrating on making __d_lookup_rcu() look pretty (and you
>need to get rid of debug build options for it to do that)
>
>               Linus

Yeah I never finished it, because, well, getting some of the corner cases done turned out to need temporary registers during initialization (before alternatives), which ended up leading to literally the ugliest assembly code I have ever written (and you know some of the crap I have done.)

This was to be able to do things like shift counts on x86.





[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux