* Liam R. Howlett <Liam.Howlett@xxxxxxxxxx> [250213 19:14]: > * Jeff Xu <jeffxu@xxxxxxxxxxxx> [250213 17:00]: > > On Thu, Feb 13, 2025 at 12:54 PM Liam R. Howlett > > <Liam.Howlett@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > VM_SEALED isn't defined in 32-bit systems, and mseal.c isn't part of > > > > > > the build. This is intentional. Any 32-bit code trying to use the > > > > > > sealing function or the VM_SEALED flag will immediately fail > > > > > > compilation. This makes it easier to identify incorrect usage. > > > > > > > > > > So you are against using the #define because the VM_SYSTEM_SEAL will be > > > > > defined, even though it will be VM_NONE? This is no worse than a > > > > > function that returns 0, and it aligns better with what we have today in > > > > > that it can be put in the list of other flags. > > > > > > > > When I was reading through all of this and considering the history of > > > > its development goals, it strikes me that a function is easier for the > > > > future if/when this can be made a boot-time setting. > > > > > > > > > > Reworking this change to function as a boot-time parameter, or whatever, > > > would not be a significant amount of work, if/when the time comes. > > > Since we don't know what the future holds, it it unnecessary to engineer > > > in a potential change for a future version when the change is easy > > > enough to make later and keep the code cleaner now. > > > > > Sure, I will put the function in mm.h for this patch. We can find a > > proper place when it is time to implement the boot-time parameter > > change. > > > > The call stack for sealing system mapping is something like below: > > > > install_special_mapping (mm/mmap.c) > > map_vdso (arch/x86/entry/vdso/vma.c) > > load_elf_binary (fs/binfmt_elf.c) > > load_misc_binary (fs/binfmt_misc.c) > > bprm_execve (fs/exec.c) > > do_execveat_common > > __x64_sys_execve > > do_syscall_64 > > > > IMO, there's a clear divide between the API implementer and the API user. > > mm and mm.h are the providers, offering the core mm functionality > > through APIs/data structures like install_special_mapping(). > > > > The exe layer (bprm_execve, map_vdso, etc) is the consumer of the > > install_special_mapping. > > The logic related to checking if sealing system mapping is enabled > > belongs to the exe layer. > > Since this is an all or nothing enabling, there is no reason to have > each caller check the same thing and do the same action. You should put > the logic into the provider - they all end up doing the same thing. > > Also, this is a compile time option so it doesn't even need to be > checked on execution - just apply it in the first place, at the source. > Your static inline function was already doing this...? > > I'm confused as to what you are arguing here because it goes against > what you had and what I suggested. The alternative you are suggesting > is more code, more instructions, and the best outcome is the same > result. I think I understand what you are saying now: the interface to install_special_mapping() needs to take the vma flag, as it does today. What I don't understand is that what you proposed and what I proposed both do that. What I'm saying is that, since system mappings are enabled, we can already know implicitly by having VM_SYSTEM_SEAL either VM_NONE or VM_SEAL. Turning this: @@ -264,11 +266,12 @@ static int map_vdso(const struct vdso_image *image, unsigned long addr) /* * MAYWRITE to allow gdb to COW and set breakpoints */ + vm_flags = VM_READ|VM_EXEC|VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC; + vm_flags |= mseal_system_mappings(); vma = _install_special_mapping(mm, text_start, image->size, - VM_READ|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + vm_flags, &vdso_mapping); to this: /* * MAYWRITE to allow gdb to COW and set breakpoints */ vma = _install_special_mapping(mm, text_start, image->size, VM_READ|VM_EXEC| - VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, + VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC| + VM_SYSTEM_SEAL, &vdso_mapping); No unsigned long vm_flags needed. It's easier to read and I don't think it's any more hidden than the vm_flags |= function call option. > > > > > > > > > > If mm maintainers prefer a #define for now, that's fine of course. The > > > > value of VM_SYSTEM_SEAL can be VM_NONE on 32-bit. > > > > > > Thanks. I think having a flag with VM_NONE on 32-bit is just as sane as > > > a "flags |= system_seal()" call that unconditionally returns 0 on > > > 32-bit. > > > > > Consider the case below in src/third_party/kernel/v6.6/fs/proc/task_mmu.c, > > > > #ifdef CONFIG_64BIT > > [ilog2(VM_SEALED)] = "sl", > > #endif > > > > If #ifdef CONFIG_64BIT is missing, it won't be detected during compile time. > > > > Setting VM_SEALED to VM_NONE could simplify the coding in some cases > > (get/set case), but it might make other cases error prone. > > > > I would prefer to not have VM_SEALED for 32 bit. > > But what I posted leaves VM_SEALED undefined for 32 bit, it defines > VM_SYSTEM_SEALED which can be placed, for instance, into > _install_special_mapping() arguments directly. Reducing the change to > just adding a new flag to the list. And it's applied to all system > mappings based on if it's enabled on compile or not. > > Also: > include/linux/mm.h:#define VM_NONE 0x00000000 > so, I'm not sure what evaluation you are concerned about? It would be > defined as 0. > > Thanks, > Liam > >