* Len Brown <lenb@xxxxxxxxxx> wrote: > On Wed, Mar 17, 2021 at 6:45 AM Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > > > > * Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > > > > > > * Chang S. Bae <chang.seok.bae@xxxxxxxxx> wrote: > > > > > > > During signal entry, the kernel pushes data onto the normal userspace > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state, > > > > which has grown over time as new features and larger registers have been > > > > added to the architecture. > > > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ > > > > constant indicates to userspace how much data the kernel expects to push on > > > > the user stack, [2][3]. > > > > > > > > However, this constant is much too small and does not reflect recent > > > > additions to the architecture. For instance, when AVX-512 states are in > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB. > > > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can > > > > cause user stack overflow when delivering a signal. > > > > > > > uapi: Define the aux vector AT_MINSIGSTKSZ > > > > x86/signal: Introduce helpers to get the maximum signal frame size > > > > x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ > > > > selftest/sigaltstack: Use the AT_MINSIGSTKSZ aux vector if available > > > > x86/signal: Detect and prevent an alternate signal stack overflow > > > > selftest/x86/signal: Include test cases for validating sigaltstack > > > > > > So this looks really complicated, is this justified? > > > > > > Why not just internally round up sigaltstack size if it's too small? > > > This would be more robust, as it would fix applications that use > > > MINSIGSTKSZ but don't use the new AT_MINSIGSTKSZ facility. > > > > > > I.e. does AT_MINSIGSTKSZ have any other uses than avoiding the > > > segfault if MINSIGSTKSZ is used to create a small signal stack? > > > > I.e. if the kernel sees a too small ->ss_size in sigaltstack() it > > would ignore ->ss_sp and mmap() a new sigaltstack instead and use that > > for the signal handler stack. > > > > This would automatically make MINSIGSTKSZ - and other too small sizes > > work today, and in the future. > > > > But the question is, is there user-space usage of sigaltstacks that > > relies on controlling or reading the contents of the stack? > > > > longjmp using programs perhaps? > > For the legacy binary that requests a too-small sigaltstack, there are > several choices: > > We could detect the too-small stack at sigaltstack(2) invocation and > return an error. > This results in two deal-killing problems: > First, some applications don't check the return value, so the check > would be fruitless. > Second, those that check and error-out may be programs that never > actually take the signal, and so we'd be causing a dusty binary to > exit, when it didn't exit on another system, or another kernel. > > Or we could detect the too small stack at signal registration time. > This has the same two deal-killers as above. > > Then there is the approach in this patch-set, which detects an > imminent stack overflow at run time. > It has neither of the two problems above, and the benefit that we now > prevent data corruption > that could have been happening on some systems already today. The > down side is that the dusty binary > that does request the too-small stack can now die at run time. > > So your idea of recognizing the problem and conjuring up a > sufficient stack is compelling, since it would likely "just work", > no matter how dumb the program. But where would the the sufficient > stack come from -- is this a new kernel buffer, or is there a way to > abscond some user memory? I would expect a signal handler to look > at the data on its stack and nobody else will look at that stack. > But this is already an unreasonable program for allocating a special > signal stack in the first place :-/ So yes, one could imagine the > signal handler could longjump instead of gracefully completing, and > if this specially allocated signal stack isn't where the user > planned, that could be trouble. We could mmap() (implicitly) new anonymous memory - but I can see why this is probably more trouble than worth... > Another idea we discussed was to detect the potential overflow at > run-time, and instead of killing the process, just push the signal > onto the regular user stack. this might actually work, but it is > sort of devious; and it would not work in the case where the user > overflowed their regular stack already, which may be the most > (only?) compelling reason that they allocated and declared a special > sigaltstack in the first place... Yeah, this doesn't sound deterministic enough. Ok, thanks for the detailed answers - I withdraw my objections, let's proceed with the approach you are proposing? Thanks, Ingo