On Fri, Aug 25, 2023 at 6:08 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, 25 Aug 2023 at 17:52, Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote: > > > > So 2 concerns where "I'll do it in inline asm" can pessimize codegen: > > 1. You alluded to this, but what happens when one of these functions > > is called with a constant? > > This is why our headers have a lot of __builtin_constant_p()'s in them.. > > In this particular case, see the x86 asm/bitops.h code: > > #define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) : > variable_ffs(x)) > > but this is actually quite a common pattern, and it's often not about > something like __builtin_ffs() at all. I was a reviewer on commit fdb6649ab7c1 ("x86/asm/bitops: Use __builtin_ctzl() to evaluate constant expressions"); I'm familiar with the pattern. > > See all the other __builtin_constant_p()'s that we have in that same > file because we basically just use different code sequences for > constants. > > And that file isn't even unusual. We use it quite a lot when we care > about code generation for some particular case. More so my point was x86 bitops is missing commit 2fcff790dcb4 ("powerpc: Use builtin functions for fls()/__fls()/fls64()") treatment. I've sent https://lore.kernel.org/llvm/20230828-x86_fls-v1-1-e6a31b9f79c3@xxxxxxxxxx/. > > > 2. by providing the definition of a symbol typically provided by libc > > (and then not building with -ffreestanding) pessimizes libcall > > optimization. > > .. and this is partly why we often avoid libgcc things, and do certain > things by hand. (Sorry if the following rant is prior knowledge, it's mostly for reference for others cc'ed who might not know this) Careful, `-ffreestanding` and libgcc are two orthogonal things (at least in my mind). -ffreestanding is to libc as --rtlib= is to the compiler runtime (which is distinct from the libc) `-ffreestanding` is more about "does the runtime environment somehow provide libc symbols." libgcc (or llvm's equivalent "compiler-rt") is not responsible for providing symbols from libc. `--rtlib=` controls what compiler runtime will be used, but in my experience, today's compilers don't make codegen decisions on that value. These are mostly runtime helpers for "idk how to do <complicated math thing, such as double word division>" or "maybe you didn't want that inline." What's brittle about making codegen decisions with regards to these flags though is that these dependencies grow over time, and yet it's not possible today (AFAIK) to specify what's the minimum target to support. For instance, IIRC glibc recently added support for one of the kernel's string.h routines, maybe strlcpy or something. https://sourceware.org/git/?p=glibc.git;a=commit;h=454a20c8756c9c1d55419153255fc7692b3d2199 When is it safe for the compiler to start transforming calls to other functions into calls to strlcpy? (Guess: year 2033, because:) What happens when dynamically linking against older versions of glib that do not provide that symbol? > > The classic rule is "Don't do 64-bit divisions using the C '/' operator". > > So in the kernel you have to use do_div() and friends, because the > library versions are often disgusting and might not know that 64/32 is > much much cheaper and is what you want. And thus the same problem exists for the kernel wrt --rtlib that I alluded to above for strlcpy. By providing a partial implementation of a compiler runtime (--rtlib=), the compiler will frequently emit libcalls to symbols for which the kernel hasn't provided. You can avoid open coded double word division in the kernel all you want but: 1. again you're probably pessimizing division by constant remainder by using div64_u64. GCC is *really* good at replacing these when the divisor is constant; IME better than clang. 2. even avoiding open coded division, the compiler can still insert division; loop-elision can replace loops outright if the trip count is adjusted by a determinable value. (see 3220022038b9). By providing a partial compiler runtime, and then using every -mno- flag in the book, you tie the compiler's hands wrt what it can emit vs libcall. There's not even a way to express to today's compiler that "we have a compiler runtime, it's just straight up missing things." Personally, I think a clang-tidy check for open coded division is perhaps a better way to enforce the kernel's posture than providing half a compiler runtime then doing gymnastics in the code to work around the emission of libcalls to __divdi3() or__aeabi_uldivmod() (mostly on 32b platforms). A linkage failure is nice, but only occurs on 32b targets and it doesn't disambiguate between the case of developer open coded division vs compiler inserted division. > > And quite often we simply use other names - but then we also do *not* > build with -freestanding, because -freestanding has at least > traditionally meant that the compiler won't optimize the simple and > obvious cases (typically things like "memcpy with a constant size"). Personal opinion: we very much do NOT want to use -ffreestanding for those libcall optimizations. I discussed this recently with ARCH=loongarch folks: commit 3f301dc292eb ("LoongArch: Replace -ffreestanding with finer-grained -fno-builtin's") It is my intention to remove -ffreestanding from ARCH=i386. https://github.com/ClangBuiltLinux/linux/issues/1583 I had to first fix a bug in LLVM though https://reviews.llvm.org/D125285 So rather than remove it outright, we might need to retain it for builds with older releases of clang for now. Though as you allude to down thread, perhaps some things that were the case in linux 1.0 / gcc 1.40 no longer hold. Which is why adding such flags to kernel makefiles really ought to be accompanied by a comment in sources linking to an issue tracker report, so that we might clean these up one day. > > So we mix and match and pick the best option. Gross, and like *could you not?* I suspect it's more so the case of a developer not realising it's perhaps a compiler bug, or not reporting such bug, and trying flags they're heard of once until something links. Any use of -ffreestanding for any arch had better have a comment to a compiler vendor's bug tracker laying out why that's necessary for that arch and not others. Many kernel developers are allergic to filing formal compiler bugs in places where compiler vendors are looking, IME. > > The kernel really doesn't care about architecture portability, because > honestly, something like "ffs()" is entirely *trivial* to get right, > compared to the real differences between architectures (eg VM and IO > differences etc). > > Linus -- Thanks, ~Nick Desaulniers