* Thomas Garnier <thgarnie@xxxxxxxxxx> wrote: > This patch ensures a syscall does not return to user-mode with a kernel > address limit. If that happened, a process can corrupt kernel-mode > memory and elevate privileges. Don't start changelogs with 'This patch' - it's obvious that we are talking about this patch. Writing: Ensure that a syscall does not return to user-mode with a kernel address limit. If that happens, a process can corrupt kernel-mode memory and elevate privileges. also note the spelling fix I did. (There's another spelling error elsewhere in this changelog as well.) Please read changelogs! > For example, it would mitigation this bug: > > - https://bugs.chromium.org/p/project-zero/issues/detail?id=990 > > The CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE option is also > added so each architecture can optimize this change. As I pointed it out in my previous reply this Kconfig name is awfully long - but it should have been obvious when this changelog was written ... > Signed-off-by: Thomas Garnier <thgarnie@xxxxxxxxxx> > Tested-by: Kees Cook <keescook@xxxxxxxxxxxx> > --- > Based on next-20170410 > --- > arch/s390/Kconfig | 1 + > include/linux/syscalls.h | 26 +++++++++++++++++++++++++- > init/Kconfig | 6 ++++++ > kernel/sys.c | 13 +++++++++++++ > 4 files changed, 45 insertions(+), 1 deletion(-) > > diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig > index d25435d94b6e..489a0cc6e46b 100644 > --- a/arch/s390/Kconfig > +++ b/arch/s390/Kconfig > @@ -103,6 +103,7 @@ config S390 > select ARCH_INLINE_WRITE_UNLOCK_BH > select ARCH_INLINE_WRITE_UNLOCK_IRQ > select ARCH_INLINE_WRITE_UNLOCK_IRQRESTORE > + select ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE > select ARCH_SAVE_PAGE_KEYS if HIBERNATION > select ARCH_SUPPORTS_ATOMIC_RMW > select ARCH_SUPPORTS_DEFERRED_STRUCT_PAGE_INIT > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index 980c3c9b06f8..801a7a74fe28 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -191,6 +191,27 @@ extern struct trace_event_functions exit_syscall_print_funcs; > SYSCALL_METADATA(sname, x, __VA_ARGS__) \ > __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) > > + > +/* > + * Called before coming back to user-mode. Returning to user-mode with an > + * address limit different than USER_DS can allow to overwrite kernel memory. > + */ > +static inline void verify_pre_usermode_state(void) { > + BUG_ON(!segment_eq(get_fs(), USER_DS)); > +} Non-standard coding style. > + > +#ifndef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE > +#define __CHECK_USER_CALLER() \ > + bool user_caller = segment_eq(get_fs(), USER_DS) > +#define __VERIFY_PRE_USERMODE_STATE() \ > + if (user_caller) verify_pre_usermode_state() > +#else > +#define __CHECK_USER_CALLER() > +#define __VERIFY_PRE_USERMODE_STATE() > +asmlinkage void address_limit_check_failed(void); > +#endif > + > + > #define __PROTECT(...) asmlinkage_protect(__VA_ARGS__) > #define __SYSCALL_DEFINEx(x, name, ...) \ > asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ > @@ -199,7 +220,10 @@ extern struct trace_event_functions exit_syscall_print_funcs; > asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ > asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ > { \ > - long ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \ > + long ret; \ > + __CHECK_USER_CALLER(); \ > + ret = SYSC##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \ > + __VERIFY_PRE_USERMODE_STATE(); \ > __MAP(x,__SC_TEST,__VA_ARGS__); \ > __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \ > return ret; \ BTW., the '__VERIFY_PRE_USERMODE_STATE()' name is highly misleading: the 'pre' prefix suggests that this is done before a system call - while it's done afterwards. The solution is to not try to specify the exact call placement in the name, just describe the functionality (and harmonize along the common prefix). > +config ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE > + bool > + help > + Disable the generic pre-usermode state verification. Allow each > + architecture to optimize how and when the verification is done. > + Please name the Kconfig symbols something like this: CONFIG_ADDR_LIMIT_CHECK CONFIG_ADDR_LIMIT_CHECK_ARCH or so, which tells us whether the check is done by the architecture code, without breaking the col80 limit with a single Kconfig name. BTW: > +#ifdef CONFIG_ARCH_NO_SYSCALL_VERIFY_PRE_USERMODE_STATE > +/* > + * This function is called when an architecture specific implementation detected > + * an invalid address limit. The generic user-mode state checker will finish on > + * the appropriate BUG_ON. > + */ > +asmlinkage void address_limit_check_failed(void) > +{ > + verify_pre_usermode_state(); > + panic("address_limit_check_failed called with a valid user-mode state"); It's very unconstructive to unconditionally panic the system, just because some kernel code leaked the address limit! Do a warn-once printout and kill the current task (i.e. don't continue execution), but don't crash everything else! Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html