On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote: >> > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable? >> > >> > They are all intended to be used for runtime debugging, so I'd imagine so. >> >> Then I strongly suggest putting a nonzero value here. As you write >> that "with every release of LLVM, both of these sanitizers eat up more and more >> of the stack", don't you want to have at least some canary to detect >> when "more and more" is guaranteed to run into problems? > > FRAME_WARN is a poor canary. First, it does not necessarily indicate > that a build is faulty (a single bloated stack frame won't crash the > system). I agree it's flawed, but it does catch a lot of bugs, both in the driver and the compiler. What we should probably have is some better runtime debugging in addition to FRAME_WARN, but it's better than nothing. One idea that I've suggested in the past is to add a soft stack limit that is lower than THREAD_SIZE, using VMAP_STACK with a custom stack start and a read-only page at the end to catch a thread exceeding the soft limit and print a backtrace before marking the page writable. > Second, devs are unlikely to fix a function because its stack frame is > too big under some exotic tool+compiler combination. I've probably sent hundreds of fixes for these in the past. Most of the time there is an actual driver bug, and almost always the driver maintainers are responsive and treat the report with the appropriate urgency: even if only some configurations actually push it over the limit, the general case is some data structure that is hundreds of bytes long and was not actually meant to be on the stack. The gcc bug reports also usually get addressed quickly, though we've had problems with clang not making progress on known bugs for years. It sounds like Nick has made some important progress on clang very recently, so we should be able to raise the minimum clang version for kasan and kcsan once there is a known good release. > So the remaining option would be to just increase the frame size every > time a new function surpasses the limit. That is clearly not an option, though we could try to add Kconfig dependencies that avoid the known bad combinations, such as annotating the AMD GPU driver as depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN) Arnd