On Mon, Oct 10, 2022, at 8:40 PM, Nick Desaulniers wrote: > On Fri, Oct 7, 2022 at 3:54 PM Nick Desaulniers <ndesaulniers@xxxxxxxxxx> wrote: > $ CROSS_COMPILE=aarch64-linux-gnu- ARCH=arm64 make -j128 defconfig fs/select.o > $ llvm-objdump -Dr --disassemble-symbols=core_sys_select fs/select.o | > grep do_select > 1a48: 2e fb ff 97 bl 0x700 <do_select> > > Same for 32b ARM. > arm-linux-gnueabi-gcc (Debian 10.2.1-6) 10.2.1 20210110 > > $ CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm make -j128 defconfig fs/select.o > $ llvm-objdump -Dr --disassemble-symbols=core_sys_select fs/select.o | > grep do_select > 1620: 07 fc ff eb bl #-4068 <do_select> > > Is there a set of configs or different compiler version for which > that's not the case? Perhaps. But it doesn't look like marking > do_select noinline_for_stack changes the default behavior for GCC > builds, which is good. I checked all arm32 defconfigs, and all supported gcc versions for arm32, they all behave the same. > So it looks like it's just clang being aggressive with inlining since > it doesn't have -fconserve-stack. I think > https://lore.kernel.org/lkml/20221007201140.1744961-1-ndesaulniers@xxxxxxxxxx/ > is still on the right track, though I'd remove the 32b only guard for > v2. I think it's again the difference between top-down and bottom-up inlining. > Paul also mentioned that -finline-max-stacksize is a thing, at least > for clang. > https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-finline-max-stacksize > Though this only landed recently > https://reviews.llvm.org/rG8564e2fea559 and wont ship until clang-16. > That feels like a large hammer for core_sys_select/do_select; I think > we can use a fine scalpel. But it might be interesting to use that > with KASAN. It's an interesting question whether it would help or hurt with KASAN_STACK: Normally the idea is that KASAN_STACK intentionally makes stack slots inside of a function non-overlapping, similar to the use-after-scope sanitizer that we no longer use because it caused too many stack overflows. Making it inline less should help reduce the actual stack consumption (not just the reported usage) because it makes called functions reuse the same stack slots, but it also makes KASAN_STACK less effective because of the same thing. If -finline-max-stacksize is the equivalent of gcc's -fconserve-stack, we could of course just always enable that. Arnd