On Wed, Oct 16, 2024 at 6:10 PM Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> wrote: > > > + exec.seal_system_mappings = [KNL] > > + Format: { never | always } > > + Seal system mappings: vdso, vvar, sigpage, uprobes, > > + vsyscall. > > + This overwrites KCONFIG CONFIG_SEAL_SYSTEM_MAPPINGS_* > > + - 'never': never seal system mappings. > > Not true, uprobes are sealed when 'never' is selected. > Thanks. I forgot to uprobes from the description in Kconfig and kernel-parameters.txt, will update. > > + - 'always': always seal system mappings. > > + If not specified or invalid, default is the KCONFIG value. > > + This option has no effect if CONFIG_64BIT=n > > + > > early_page_ext [KNL,EARLY] Enforces page_ext initialization to earlier > > stages so cover more early boot allocations. > > Please note that as side effect some optimizations > > diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c > > index 2fb7d53cf333..20a3000550d2 100644 > > --- a/arch/x86/entry/vsyscall/vsyscall_64.c > > +++ b/arch/x86/entry/vsyscall/vsyscall_64.c > > @@ -32,6 +32,7 @@ > > #include <linux/mm_types.h> > > #include <linux/syscalls.h> > > #include <linux/ratelimit.h> > > +#include <linux/fs.h> > > > > #include <asm/vsyscall.h> > > #include <asm/unistd.h> > > @@ -366,8 +367,12 @@ void __init map_vsyscall(void) > > set_vsyscall_pgtable_user_bits(swapper_pg_dir); > > } > > > > - if (vsyscall_mode == XONLY) > > - vm_flags_init(&gate_vma, VM_EXEC); > > + if (vsyscall_mode == XONLY) { > > + unsigned long vm_flags = VM_EXEC; > > + > > + update_seal_exec_system_mappings(&vm_flags); > > + vm_flags_init(&gate_vma, vm_flags); > > + } > > > > BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) != > > (unsigned long)VSYSCALL_ADDR); > > diff --git a/fs/exec.c b/fs/exec.c > > index 77364806b48d..5030879cda47 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > Does it make sense for this to live in exec? Couldn't you put it in the > mm/mseal.c file? It's vma flags for mappings and you've put it in > fs/exec? > If you are referring to utilities related to kernel cmdline, they should be in this file. > > @@ -68,6 +68,7 @@ > > #include <linux/user_events.h> > > #include <linux/rseq.h> > > #include <linux/ksm.h> > > +#include <linux/fs_parser.h> > > > > #include <linux/uaccess.h> > > #include <asm/mmu_context.h> > > @@ -2159,3 +2160,55 @@ fs_initcall(init_fs_exec_sysctls); > > #ifdef CONFIG_EXEC_KUNIT_TEST > > #include "tests/exec_kunit.c" > > #endif > > + > > +#ifdef CONFIG_64BIT > > +/* > > + * Kernel cmdline overwrite for CONFIG_SEAL_SYSTEM_MAPPINGS_X > > + */ > > +enum seal_system_mappings_type { > > + SEAL_SYSTEM_MAPPINGS_NEVER, > > + SEAL_SYSTEM_MAPPINGS_ALWAYS > > +}; > > + > > +static enum seal_system_mappings_type seal_system_mappings __ro_after_init = > > + IS_ENABLED(CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS) ? SEAL_SYSTEM_MAPPINGS_ALWAYS : > > + SEAL_SYSTEM_MAPPINGS_NEVER; > > + > > +static const struct constant_table value_table_sys_mapping[] __initconst = { > > + { "never", SEAL_SYSTEM_MAPPINGS_NEVER}, > > + { "always", SEAL_SYSTEM_MAPPINGS_ALWAYS}, > > + { } > > +}; > > + > > +static int __init early_seal_system_mappings_override(char *buf) > > +{ > > + if (!buf) > > + return -EINVAL; > > + > > + seal_system_mappings = lookup_constant(value_table_sys_mapping, > > + buf, seal_system_mappings); > > + > > + return 0; > > +} > > + > > +early_param("exec.seal_system_mappings", early_seal_system_mappings_override); > > + > > +static bool seal_system_mappings_enabled(void) > > +{ > > + if (seal_system_mappings == SEAL_SYSTEM_MAPPINGS_ALWAYS) > > + return true; > > + > > + return false; > > +} > > This function seems unnecessary, it is called from another 3-4 line > function only. > It is more readable this way. > > + > > +void update_seal_exec_system_mappings(unsigned long *vm_flags) > > +{ > > + if (!(*vm_flags & VM_SEALED) && seal_system_mappings_enabled()) > > Why !(*vm_flags & VM_SEALED) here? > If vm_flags is already sealed, then there is no need to check seal_system_mappings_enabled. > > + *vm_flags |= VM_SEALED; > > + > > +} > > Instead of passing a pointer around and checking enabled, why don't you > have a function that just returns the VM_SEALED or 0 and just or it into > the flags? This seems very heavy for what it does, why did you do it > this way? > Why is that heavy ? passing a pointer for updating variables is natural. > The name is also very long and a bit odd, it could be used for other > reasons, but you have _system_mappings on the end, and you use seal but > it's mseal (or vm_seal)? Would mseal_flag() work? > It could be longer :-) it means update_sealing_flag_for_executable_system_mappings. mseal_flag is too short and not descriptive. > > +#else > > +void update_seal_exec_system_mappings(unsigned long *vm_flags) > > +{ > > +} > > +#endif /* CONFIG_64BIT */ > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 42444ec95c9b..6e44aca4b24b 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > Again, I don't understand why fs.h is the place for mseal definitions? > include/linux/fs.h contains other exec related function signatures. So it is better to keep them at the same header. > > @@ -3079,6 +3079,7 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos); > > extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *); > > extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *); > > extern struct file * open_exec(const char *); > > +extern void update_seal_exec_system_mappings(unsigned long *vm_flags); > > We are dropping extern where possible now. > extern can be dropped, it appears not causing link error. > > > > /* fs/dcache.c -- generic fs support functions */ > > extern bool is_subdir(struct dentry *, struct dentry *); > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > > index c47a0bf25e58..e9876fae8887 100644 > > --- a/kernel/events/uprobes.c > > +++ b/kernel/events/uprobes.c > > @@ -1506,7 +1506,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area) > > } > > > > vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE, > > - VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, > > + VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO|VM_SEALED, > > &xol_mapping); > > Changing all uprobes seems like something that should probably be > mentioned more than just the note at the end of the change log, even if > you think it won't have any impact. The note is even hidden at the end > of a paragraph. > > I would go as far as splitting this patch out as its own so that the > subject line specifies that all uprobes will be VM_SEALED now. > > Maybe it's fine but maybe it isn't and you've buried it so that it will > be missed by virtually everyone. > I will add "It is sealed from creation." in the above "uprobe" section. > > > if (IS_ERR(vma)) { > > ret = PTR_ERR(vma); > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 57fd5ab2abe7..d4717e34a60d 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -2133,6 +2133,7 @@ struct vm_area_struct *_install_special_mapping( > > unsigned long addr, unsigned long len, > > unsigned long vm_flags, const struct vm_special_mapping *spec) > > { > > + update_seal_exec_system_mappings(&vm_flags); > > return __install_special_mapping(mm, addr, len, vm_flags, (void *)spec, > > &special_mapping_vmops); > > If you were to return a flag, you could change the vm_flags argument to > vm_flags | mseal_flag() > passing pointer seems to be the most efficient way. Thanks -Jeff