On Thu, May 23, 2019 at 06:43:46PM +0100, Catalin Marinas wrote: > On Thu, May 23, 2019 at 09:38:19AM -0700, Kees Cook wrote: > > What on this front would you be comfortable with? Given it's a new > > feature isn't it sufficient to have a CONFIG (and/or boot option)? > > I'd rather avoid re-building kernels. A boot option would do, unless we > see value in a per-process (inherited) personality or prctl. The I think I've convinced myself that the normal<->TBI ABI control should be a boot parameter. More below... > > What about testing tools that intentionally insert high bits for syscalls > > and are _expecting_ them to fail? It seems the TBI series will break them? > > In that case, do we need to opt into TBI as well? > > If there are such tools, then we may need a per-process control. It's > basically an ABI change. syzkaller already attempts to randomly inject non-canonical and 0xFFFF....FFFF addresses for user pointers in syscalls in an effort to find bugs like CVE-2017-5123 where waitid() via unchecked put_user() was able to write directly to kernel memory[1]. It seems that using TBI by default and not allowing a switch back to "normal" ABI without a reboot actually means that userspace cannot inject kernel pointers into syscalls any more, since they'll get universally stripped now. Is my understanding correct, here? i.e. exploiting CVE-2017-5123 would be impossible under TBI? If so, then I think we should commit to the TBI ABI and have a boot flag to disable it, but NOT have a process flag, as that would allow attackers to bypass the masking. The only flag should be "TBI or MTE". If so, can I get top byte masking for other architectures too? Like, just to strip high bits off userspace addresses? ;) (Oh, in looking I see this is implemented with sign-extension... why not just a mask? So it'll either be valid userspace address or forced into the non-canonical range?) [1] https://salls.github.io/Linux-Kernel-CVE-2017-5123/ > > Alright, the tl;dr appears to be: > > - you want more assurances that we can find __user stripping in the > > kernel more easily. (But this seems like a parallel problem.) > > Yes, and that we found all (most) cases now. The reason I don't see it > as a parallel problem is that, as maintainer, I promise an ABI to user > and I'd rather stick to it. I don't want, for example, ncurses to stop > working because of some ioctl() rejecting tagged pointers. But this is what I don't understand: it would need to be ncurses _using TBI_, that would stop working (having started to work before, but then regress due to a newly added one-off bug). Regular ncurses will be fine because it's not using TBI. So The Golden Rule isn't violated, and by definition, it's a specific regression caused by some bug (since TBI would have had to have worked _before_ in the situation to be considered a regression now). Which describes the normal path for kernel development... add feature, find corner cases where it doesn't work, fix them, encounter new regressions, fix those, repeat forever. > If it's just the occasional one-off bug I'm fine to deal with it. But > no-one convinced me yet that this is the case. You believe there still to be some systemic cases that haven't been found yet? And even if so -- isn't it better to work on that incrementally? > As for the generic driver code (filesystems or other subsystems), > without some clear direction for developers, together with static > checking/sparse, on how user pointers are cast to longs (one example), > it would become my responsibility to identify and fix them up with any > kernel release. This series is not providing such guidance, just adding > untagged_addr() in some places that we think matter. What about adding a nice bit of .rst documentation that describes the situation and shows how to use untagged_addr(). This is the kind of kernel-wide change that "everyone" needs to know about, and shouldn't be the arch maintainer's sole responsibility to fix. > > - we might need to opt in to TBI with a prctl() > > Yes, although still up for discussion. I think I've talked myself out of it. I say boot param only! :) So what do you say to these next steps: - change untagged_addr() to use a static branch that is controlled with a boot parameter. - add, say, Documentation/core-api/user-addresses.rst to describe proper care and handling of user space pointers with untagged_addr(), with examples based on all the cases seen so far in this series. - continue work to improve static analysis. Thanks for wading through this with me! :) -- Kees Cook