On Mon, May 8, 2017 at 11:56 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > * Kees Cook <keescook@xxxxxxxxxxxx> wrote: > >> > There's the option of using GCC plugins now that the infrastructure was >> > upstreamed from grsecurity. It can be used as part of the regular build >> > process and as long as the analysis is pretty simple it shouldn't hurt compile >> > time much. >> >> Well, and that the situation may arise due to memory corruption, not from >> poorly-matched set_fs() calls, which static analysis won't help solve. We need >> to catch this bad kernel state because it is a very bad state to run in. [attempting some thread-merging] > Ok, so that's CVE-2010-4258, where an oops with KERNEL_DS set was used to escalate > privileges, due to the kernel's oops handler not cleaning up the KERNEL_DS. The > exploit used another bug, a crash in a network protocol handler, to execute the > oops handler with KERNEL_DS set. Right, I didn't mean to suggest that vulnerability would be fixed by this solution. I was trying to show how there can be some pretty complex interaction with exceptions/interrupts/etc that would make pure static analysis still miss things. > If memory corruption corrupted the task state into having addr_limit set to > KERNEL_DS then there's already a fair chance that it's game over: it could also > have set *uid to 0, or changed a sensitive PF_ flag, or a number of other > things... > > Furthermore, think about it: there's literally an infinite amount of corrupted > task states that could be a security problem and that could be checked after every > system call. Do we want to check every one of them? Right, but this "slippery slope" argument isn't the best way to reject security changes. Let me take a step back and describe the threat, and where we should likely spend time: The primary threat with addr_limit getting changed is that a narrowly-scoped attack (traditionally stack exhaustion or adjacent-stack large-index writes) could be leveraged into opening the entire kernel to writes (by allowing all syscalls with a copy_to_user() call to suddenly be able to write to kernel memory). So, really, the flaw is having addr_limit at all. Removing set_fs() should, I think, allow this to become a const (or at least should get us a lot closer). The main path to corrupting addr_limit has been via stack corruption. On architectures with CONFIG_THREAD_INFO_IN_TASK, this risk is greatly reduced already, but it's not universally available yet. (And as long as we're talking about stack attacks, CONFIG_VMAP_STACK makes cross-stack overflows go away, and cross-stack indexing harder, but that's not really about addr_limit since currently nothing with VMAP_STACK doesn't already have THREAD_INFO_IN_TASK.) So, left with a still exploitable target in memory that allows such an expansion of attack method, I still think it's worth keeping this patch series, but if we can drop set_fs() I could probably be convinced the benefit of the series doesn't exceed the cost on THREAD_INFO_IN_TASK-architectures (x86, arm64, s390). But that means at least currently keeping it on arm, for example. If we can make addr_limit const, well, we don't need the series at all. -Kees -- Kees Cook Pixel Security -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html