* jeffxu@xxxxxxxxxxxx <jeffxu@xxxxxxxxxxxx> [241014 17:50]: > From: Jeff Xu <jeffxu@xxxxxxxxxxxx> > > Seal vdso, vvar, sigpage, uprobes and vsyscall. > > Those mappings are readonly or executable only, sealing can protect > them from ever changing during the life time of the process. For > complete descriptions of memory sealing, please see mseal.rst [1]. > > System mappings such as vdso, vvar, and sigpage (for arm) are > generated by the kernel during program initialization. These mappings > are designated as non-writable, and sealing them will prevent them > from ever becoming writeable. ^ or ever removed. This is a pretty big deal. Platforms are trying to make it so that vdso is the fast path, but if they are removed then things stop using them and it's not a problem. This description is robbing them of the information they need to know that, and it's not in your change log either. I had said before that you need to be clear about the inability to remove the mappings that are sealed, as the description above heavily implies that it is only stopping them from becoming writeable. > > Unlike the aforementioned mappings, the uprobe mapping is not > established during program startup. However, its lifetime is the same > as the process's lifetime [2], thus sealable. > > The vdso, vvar, sigpage, and uprobe mappings all invoke the > _install_special_mapping() function. As no other mappings utilize this > function, it is logical to incorporate sealing logic within > _install_special_mapping(). This approach avoids the necessity of > modifying code across various architecture-specific implementations. > > The vsyscall mapping, which has its own initialization function, is > sealed in the XONLY case, it seems to be the most common and secure > case of using vsyscall. > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > alter the mapping of vdso, vvar, and sigpage during restore > operations. Consequently, this feature cannot be universally enabled > across all systems. To address this, a kernel configuration option has > been introduced to enable or disable this functionality. Note, uprobe > is always sealed and not controlled by this kernel configuration. > > I tested CONFIG_SEAL_SYSTEM_MAPPINGS_ALWAYS with ChromeOS, > which doesn’t use CHECKPOINT_RESTORE. > > [1] Documentation/userspace-api/mseal.rst > [2] https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@xxxxxxxxxxxxxx/ > > History: > V2: > Seal uprobe always (Oleg Nesterov) > Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov) The only update to the comment I see is the pointer to mseal.rst for a complete description? > Rebase to linux_main > > V1: > https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@xxxxxxxxxx/ > > Jeff Xu (1): > exec: seal system mappings > > .../admin-guide/kernel-parameters.txt | 10 ++++ > arch/x86/entry/vsyscall/vsyscall_64.c | 9 +++- > fs/exec.c | 53 +++++++++++++++++++ > include/linux/fs.h | 1 + > kernel/events/uprobes.c | 2 +- > mm/mmap.c | 1 + > security/Kconfig | 26 +++++++++ > 7 files changed, 99 insertions(+), 3 deletions(-) > > -- > 2.47.0.rc1.288.g06298d1525-goog >