Re: [RFC PATCH v2 1/1] exec: seal system mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* jeffxu@xxxxxxxxxxxx <jeffxu@xxxxxxxxxxxx> [241014 17:50]:
> From: Jeff Xu <jeffxu@xxxxxxxxxxxx>
> 
> Seal vdso, vvar, sigpage, uprobes and vsyscall.
> 
> Those mappings are readonly or executable only, sealing can protect
> them from ever changing during the life time of the process. For
> complete descriptions of memory sealing, please see mseal.rst [1].
> 
> System mappings such as vdso, vvar, and sigpage (for arm) are
> generated by the kernel during program initialization. These mappings
> are designated as non-writable, and sealing them will prevent them
> from ever becoming writeable.
> 
> Unlike the aforementioned mappings, the uprobe mapping is not
> established during program startup. However, its lifetime is the same
> as the process's lifetime [2], thus sealable.
> 
> The vdso, vvar, sigpage, and uprobe mappings all invoke the
> _install_special_mapping() function. As no other mappings utilize this
> function, it is logical to incorporate sealing logic within
> _install_special_mapping(). This approach avoids the necessity of
> modifying code across various architecture-specific implementations.
> 
> The vsyscall mapping, which has its own initialization function, is
> sealed in the XONLY case, it seems to be the most common and secure
> case of using vsyscall.
> 
> It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may
> alter the mapping of vdso, vvar, and sigpage during restore
> operations. Consequently, this feature cannot be universally enabled
> across all systems. To address this, a kernel configuration option has
> been introduced to enable or disable this functionality. Note, uprobe
> is always sealed and not controlled by this kernel configuration.

Considering that uprobes are always sealed regardless of boot or kernel
config, the descriptions below all suffer from inaccurate descriptions.

It is also very easy to overlook that you are changing the default vm
flags of uprobe, especially considering the text implies that they are
not altered if you select "never".

> 
> [1] Documentation/userspace-api/mseal.rst
> [2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@xxxxxxxxxxxxxx/
> 
> Signed-off-by: Jeff Xu <jeffxu@xxxxxxxxxxxx>
> ---
>  .../admin-guide/kernel-parameters.txt         | 10 ++++
>  arch/x86/entry/vsyscall/vsyscall_64.c         |  9 +++-
>  fs/exec.c                                     | 53 +++++++++++++++++++
>  include/linux/fs.h                            |  1 +
>  kernel/events/uprobes.c                       |  2 +-
>  mm/mmap.c                                     |  1 +
>  security/Kconfig                              | 26 +++++++++
>  7 files changed, 99 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index e7bfe1bde49e..02e5eb23d76f 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1538,6 +1538,16 @@
>  			Permit 'security.evm' to be updated regardless of
>  			current integrity status.
>  
> +	exec.seal_system_mappings = [KNL]
> +			Format: { never | always }
> +			Seal system mappings: vdso, vvar, sigpage, uprobes,
> +			vsyscall.
> +			This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_*
> +			- 'never':  never seal system mappings.

Not true, uprobes are sealed when 'never' is selected.

> +			- 'always': always seal system mappings.
> +			If not specified or invalid, default is the KCONFIG value.
> +			This option has no effect if CONFIG_64BIT=n
> +
>  	early_page_ext [KNL,EARLY] Enforces page_ext initialization to earlier
>  			stages so cover more early boot allocations.
>  			Please note that as side effect some optimizations
> diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
> index 2fb7d53cf333..20a3000550d2 100644
> --- a/arch/x86/entry/vsyscall/vsyscall_64.c
> +++ b/arch/x86/entry/vsyscall/vsyscall_64.c
> @@ -32,6 +32,7 @@
>  #include <linux/mm_types.h>
>  #include <linux/syscalls.h>
>  #include <linux/ratelimit.h>
> +#include <linux/fs.h>
>  
>  #include <asm/vsyscall.h>
>  #include <asm/unistd.h>
> @@ -366,8 +367,12 @@ void __init map_vsyscall(void)
>  		set_vsyscall_pgtable_user_bits(swapper_pg_dir);
>  	}
>  
> -	if (vsyscall_mode == XONLY)
> -		vm_flags_init(&gate_vma, VM_EXEC);
> +	if (vsyscall_mode == XONLY) {
> +		unsigned long vm_flags = VM_EXEC;
> +
> +		update_seal_exec_system_mappings(&vm_flags);
> +		vm_flags_init(&gate_vma, vm_flags);
> +	}
>  
>  	BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
>  		     (unsigned long)VSYSCALL_ADDR);
> diff --git a/fs/exec.c b/fs/exec.c
> index 77364806b48d..5030879cda47 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c

Does it make sense for this to live in exec?  Couldn't you put it in the
mm/mseal.c file?  It's vma flags for mappings and you've put it in
fs/exec?

> @@ -68,6 +68,7 @@
>  #include <linux/user_events.h>
>  #include <linux/rseq.h>
>  #include <linux/ksm.h>
> +#include <linux/fs_parser.h>
>  
>  #include <linux/uaccess.h>
>  #include <asm/mmu_context.h>
> @@ -2159,3 +2160,55 @@ fs_initcall(init_fs_exec_sysctls);
>  #ifdef CONFIG_EXEC_KUNIT_TEST
>  #include "tests/exec_kunit.c"
>  #endif
> +
> +#ifdef CONFIG_64BIT
> +/*
> + * Kernel cmdline overwrite for CONFIG_SEAL_SYSTEM_MAPPINGS_X
> + */
> +enum seal_system_mappings_type {
> +	SEAL_SYSTEM_MAPPINGS_NEVER,
> +	SEAL_SYSTEM_MAPPINGS_ALWAYS
> +};
> +
> +static enum seal_system_mappings_type seal_system_mappings __ro_after_init =
> +	IS_ENABLED(CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS) ? SEAL_SYSTEM_MAPPINGS_ALWAYS :
> +	SEAL_SYSTEM_MAPPINGS_NEVER;
> +
> +static const struct constant_table value_table_sys_mapping[] __initconst = {
> +	{ "never", SEAL_SYSTEM_MAPPINGS_NEVER},
> +	{ "always", SEAL_SYSTEM_MAPPINGS_ALWAYS},
> +	{ }
> +};
> +
> +static int __init early_seal_system_mappings_override(char *buf)
> +{
> +	if (!buf)
> +		return -EINVAL;
> +
> +	seal_system_mappings = lookup_constant(value_table_sys_mapping,
> +			buf, seal_system_mappings);
> +
> +	return 0;
> +}
> +
> +early_param("exec.seal_system_mappings", early_seal_system_mappings_override);
> +
> +static bool seal_system_mappings_enabled(void)
> +{
> +	if (seal_system_mappings == SEAL_SYSTEM_MAPPINGS_ALWAYS)
> +		return true;
> +
> +	return false;
> +}

This function seems unnecessary, it is called from another 3-4 line
function only.

> +
> +void update_seal_exec_system_mappings(unsigned long *vm_flags)
> +{
> +	if (!(*vm_flags & VM_SEALED) && seal_system_mappings_enabled())

Why !(*vm_flags & VM_SEALED) here?

> +		*vm_flags |= VM_SEALED;
> +
> +}

Instead of passing a pointer around and checking enabled, why don't you
have a function that just returns the VM_SEALED or 0 and just or it into
the flags?  This seems very heavy for what it does, why did you do it
this way?

The name is also very long and a bit odd, it could be used for other
reasons, but you have _system_mappings on the end, and you use seal but
it's mseal (or vm_seal)?  Would mseal_flag() work?

> +#else
> +void update_seal_exec_system_mappings(unsigned long *vm_flags)
> +{
> +}
> +#endif /* CONFIG_64BIT */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 42444ec95c9b..6e44aca4b24b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h

Again, I don't understand why fs.h is the place for mseal definitions?

> @@ -3079,6 +3079,7 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos);
>  extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *);
>  extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *);
>  extern struct file * open_exec(const char *);
> +extern void update_seal_exec_system_mappings(unsigned long *vm_flags);

We are dropping extern where possible now.

>   
>  /* fs/dcache.c -- generic fs support functions */
>  extern bool is_subdir(struct dentry *, struct dentry *);
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index c47a0bf25e58..e9876fae8887 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1506,7 +1506,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
>  	}
>  
>  	vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
> -				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
> +				VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_SEALED,
>  				&xol_mapping);

Changing all uprobes seems like something that should probably be
mentioned more than just the note at the end of the change log, even if
you think it won't have any impact.  The note is even hidden at the end
of a paragraph.

I would go as far as splitting this patch out as its own so that the
subject line specifies that all uprobes will be VM_SEALED now.

Maybe it's fine but maybe it isn't and you've buried it so that it will
be missed by virtually everyone.


>  	if (IS_ERR(vma)) {
>  		ret = PTR_ERR(vma);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 57fd5ab2abe7..d4717e34a60d 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2133,6 +2133,7 @@ struct vm_area_struct *_install_special_mapping(
>  	unsigned long addr, unsigned long len,
>  	unsigned long vm_flags, const struct vm_special_mapping *spec)
>  {
> +	update_seal_exec_system_mappings(&vm_flags);
>  	return __install_special_mapping(mm, addr, len, vm_flags, (void *)spec,
>  					&special_mapping_vmops);

If you were to return a flag, you could change the vm_flags argument to
vm_flags | mseal_flag()

>  }
> diff --git a/security/Kconfig b/security/Kconfig
> index 28e685f53bd1..4ec8045339c3 100644
> --- a/security/Kconfig
> +++ b/security/Kconfig
> @@ -51,6 +51,32 @@ config PROC_MEM_NO_FORCE
>  
>  endchoice
>  
> +choice
> +	prompt "Seal system mappings"
> +	default SEAL_SYSTEM_MAPPINGS_NEVER
> +	help
> +	  Seal system mappings such as vdso, vvar, sigpage, uprobes and
> +	  vsyscall.
> +	  Note: kernel command line exec.seal_system_mappings overwrites this.

Not true, uprobes are always sealed.

> +
> +config SEAL_SYSTEM_MAPPINGS_NEVER
> +	bool "Traditional behavior - not sealed"

Not true, uprobes are sealed.

> +	help
> +	  Do not seal system mappings.
> +	  This is default.
> +
> +config SEAL_SYSTEM_MAPPINGS_ALWAYS
> +	bool "Always seal system mappings"
> +	depends on 64BIT
> +	depends on !CHECKPOINT_RESTORE
> +	help
> +	  Seal system mappings such as vdso, vvar, sigpage, uprobes and
> +	  vsyscall.
> +	  Note: CHECKPOINT_RESTORE might relocate vdso mapping during restore,
> +	  and remap will fail if the mapping is sealed, therefore
> +	  !CHECKPOINT_RESTORE is added as dependency.
> +endchoice
> +
>  config SECURITY
>  	bool "Enable different security models"
>  	depends on SYSFS
> -- 
> 2.47.0.rc1.288.g06298d1525-goog
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux