On Fri, Oct 7, 2022, at 9:04 PM, Nick Desaulniers wrote: > On Fri, Oct 7, 2022 at 1:28 AM Arnd Bergmann <arnd@xxxxxxxx> wrote: >> On Fri, Oct 7, 2022, at 12:21 AM, Nick Desaulniers wrote: >> > On Thu, Mar 07, 2019 at 10:01:36AM +0100, Arnd Bergmann wrote: >> >> - the kernel is built with -fsanitize=local-bounds, dropping this >> option reduces the stack allocation for this function by around >> 100 bytes, which would be the easiest change for you to build >> those kernels again without any source changes, but it may also >> be possible to change the core_sys_select function in a way that >> avoids the insertion of runtime bounds checks. > > Thanks for taking a look Arnd; ++beers_owed; > > I'm not sure we'd want to disable CONFIG_UBSAN_LOCAL_BOUNDS=y for this > particular configuration of the kernel over this, or remove > -fsanitize=local-bounds for this translation unit (even if we did so > specifically for 32b targets). FWICT, the parameter n of function > core_sys_select() is used to index into the stack allocated stack_fds, > which is what -fsanitize=local-bounds is inserting runtime guards for. Right, so what I was thinking is that the existing runtime check 'if (size > sizeof(stack_fds) / 6)' could be rewritten in a way that clang sees the bounds correctly, as the added check would test for the exact same limit, right? It might be too hard to figure out, or it might have other side-effects. >> - If I mark 'do_select' as noinline_for_stack, the reported frame >> size is decreased a lot and is suddenly independent of >> -fsanitize=local-bounds: >> fs/select.c:625:5: error: stack frame size (336) exceeds limit (100) in 'core_sys_select' [-Werror,-Wframe-larger-than] >> int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp, >> fs/select.c:479:21: error: stack frame size (684) exceeds limit (100) in 'do_select' [-Werror,-Wframe-larger-than] >> static noinline int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time) > > I think this approach makes the most sense to me; the caller > core_sys_select() has a large stack allocation `stack_fds`, and so > does the callee do_select with `table`. Add in inlining and long live > ranges and it makes sense that stack spills are going to tip us over > the threshold set by -Wframe-larger-than. > > Whether you make do_select() `noinline_for_stack` conditional on > additional configs like CC_IS_CLANG or CONFIG_UBSAN_LOCAL_BOUNDS is > perhaps also worth considering. > > How would you feel about a patch that: > 1. reverts commit ad312f95d41c ("fs/select: avoid clang stack usage warning") > 2. marks do_select noinline_for_stack > > ? That is probably ok, but it does need proper testing to ensure that there are no performance regressions. Do you know if gcc inlines the function by default? If not, we probably don't need to make it conditional. > I assume the point of "small string optimization" going on with > `stack_fds` in core_sys_select() is that the potential overhead for > kmalloc is much much higher than the cost of not inlining do_select() > into core_sys_select(). The above approach does solve this .config's > instance, and seems slightly less brittle to me. > >> However, I don't even see how this makes sense at all, given that >> the actual frame size should be at least SELECT_STACK_ALLOC! > > I think the math checks out: > > #define FRONTEND_STACK_ALLOC 256 > #define SELECT_STACK_ALLOC FRONTEND_STACK_ALLOC > long stack_fds[SELECT_STACK_ALLOC/sizeof(long)]; > > sizeof(long) == 4; // i386 ilp32 > sizeof(stack_fds) == sizeof(long) * 256 / sizeof(long) == 256 Ah right, I misread what the code actually does here. >> - The behavior of -ftrivial-auto-var-init= is a bit odd here: with =zero or >> =pattern, the stack usage is just below the limit (1020), without the >> option it is up to 1044. It looks like your .config picks =zero, which >> was dropped in the latest clang version, so it falls back to not > > Huh? What do you mean by "was dropped?" > > The config I sent has: > CONFIG_CC_HAS_AUTO_VAR_INIT_PATTERN=y > CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO_BARE=y > CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO=y > # CONFIG_INIT_STACK_NONE is not set > CONFIG_INIT_STACK_ALL_ZERO=y When I use this config on my kernel tree (currently on top of next-20220929 for unrelated work) and build with clang-16, CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO is disabled, so it falls back from CONFIG_INIT_STACK_NONE instead of the unavailable CONFIG_INIT_STACK_ALL_ZERO. > Disabling INIT_STACK_ALL_ZERO from the config provided doesn't elide > the diagnostic. > Enabling INIT_STACK_ALL_PATTERN does... explicit stack allocations in > IR haven't changed. In the generated assembly we're pushing 3x 4B > GPRs, subtracting 8B from the stack pointer, then another 1008B. > (Filed: https://github.com/llvm/llvm-project/issues/58237) > So that's 3 * 4B + 8B + 1008B == 1028. But CONFIG_FRAME_WARN is set > to 1024. I wonder if this diagnostic is not as precise as it could > be, or my math is wrong? > > It looks like for arrays INIT_STACK_ALL_PATTERN uses memset to fill > the array with 0xFF rather than 0x00 used by INIT_STACK_ALL_ZERO. Not > sure why that would make a difference, but curious that it does. The warning sizes I see are: zero: 1028 pattern: 1020 none: 1044 So you are right, even with =pattern I get the warning, even though it's 16 bytes less than the =none case I was looking at first. Arnd