On Tue, Mar 4, 2025 at 9:51 PM Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> wrote: > > On Wed, Mar 05, 2025 at 02:17:04AM +0000, jeffxu@xxxxxxxxxxxx wrote: > > From: Jeff Xu <jeffxu@xxxxxxxxxxxx> > > > > This is V9 version, addressing comments from V8, without code logic > > change. > > > > ------------------------------------------------------------------- > > As discussed during mseal() upstream process [1], mseal() protects > > the VMAs of a given virtual memory range against modifications, such > > as the read/write (RW) and no-execute (NX) bits. For complete > > descriptions of memory sealing, please see mseal.rst [2]. > > > > The mseal() is useful to mitigate memory corruption issues where a > > corrupted pointer is passed to a memory management system. For > > example, such an attacker primitive can break control-flow integrity > > guarantees since read-only memory that is supposed to be trusted can > > become writable or .text pages can get remapped. > > > > The system mappings are readonly only, memory sealing can protect > > them from ever changing to writable or unmmap/remapped as different > > attributes. > > > > System mappings such as vdso, vvar, vvar_vclock, > > vectors (arm compat-mode), sigpage (arm compat-mode), > > are created by the kernel during program initialization, and could > > be sealed after creation. > > > > Unlike the aforementioned mappings, the uprobe mapping is not > > established during program startup. However, its lifetime is the same > > as the process's lifetime [3]. It could be sealed from creation. > > > > The vsyscall on x86-64 uses a special address (0xffffffffff600000), > > which is outside the mm managed range. This means mprotect, munmap, and > > mremap won't work on the vsyscall. Since sealing doesn't enhance > > the vsyscall's security, it is skipped in this patch. If we ever seal > > the vsyscall, it is probably only for decorative purpose, i.e. showing > > the 'sl' flag in the /proc/pid/smaps. For this patch, it is ignored. > > > > It is important to note that the CHECKPOINT_RESTORE feature (CRIU) may > > alter the system mappings during restore operations. UML(User Mode Linux) > > and gVisor, rr are also known to change the vdso/vvar mappings. > > Consequently, this feature cannot be universally enabled across all > > systems. As such, CONFIG_MSEAL_SYSTEM_MAPPINGS is disabled by default. > > > > To support mseal of system mappings, architectures must define > > CONFIG_ARCH_SUPPORTS_MSEAL_SYSTEM_MAPPINGS and update their special > > mappings calls to pass mseal flag. Additionally, architectures must > > confirm they do not unmap/remap system mappings during the process > > lifetime. The existence of this flag for an architecture implies that > > it does not require the remapping of thest system mappings during > > process lifetime, so sealing these mappings is safe from a kernel > > perspective. > > > > This version covers x86-64 and arm64 archiecture as minimum viable feature. > > > > While no specific CPU hardware features are required for enable this > > feature on an archiecture, memory sealing requires a 64-bit kernel. Other > > architectures can choose whether or not to adopt this feature. Currently, > > I'm not aware of any instances in the kernel code that actively > > munmap/mremap a system mapping without a request from userspace. The PPC > > does call munmap when _install_special_mapping fails for vdso; however, > > it's uncertain if this will ever fail for PPC - this needs to be > > investigated by PPC in the future [4]. The UML kernel can add this support > > when KUnit tests require it [5]. > > > > In this version, we've improved the handling of system mapping sealing from > > previous versions, instead of modifying the _install_special_mapping > > function itself, which would affect all architectures, we now call > > _install_special_mapping with a sealing flag only within the specific > > architecture that requires it. This targeted approach offers two key > > advantages: 1) It limits the code change's impact to the necessary > > architectures, and 2) It aligns with the software architecture by keeping > > the core memory management within the mm layer, while delegating the > > decision of sealing system mappings to the individual architecture, which > > is particularly relevant since 32-bit architectures never require sealing. > > > > Prior to this patch series, we explored sealing special mappings from > > userspace using glibc's dynamic linker. This approach revealed several > > issues: > > - The PT_LOAD header may report an incorrect length for vdso, (smaller > > than its actual size). The dynamic linker, which relies on PT_LOAD > > information to determine mapping size, would then split and partially > > seal the vdso mapping. Since each architecture has its own vdso/vvar > > code, fixing this in the kernel would require going through each > > archiecture. Our initial goal was to enable sealing readonly mappings, > > e.g. .text, across all architectures, sealing vdso from kernel since > > creation appears to be simpler than sealing vdso at glibc. > > - The [vvar] mapping header only contains address information, not length > > information. Similar issues might exist for other special mappings. > > - Mappings like uprobe are not covered by the dynamic linker, > > and there is no effective solution for them. > > > > This feature's security enhancements will benefit ChromeOS, Android, > > and other high security systems. > > > > Testing: > > This feature was tested on ChromeOS and Android for both x86-64 and ARM64. > > - Enable sealing and verify vdso/vvar, sigpage, vector are sealed properly, > > i.e. "sl" shown in the smaps for those mappings, and mremap is blocked. > > - Passing various automation tests (e.g. pre-checkin) on ChromeOS and > > Android to ensure the sealing doesn't affect the functionality of > > Chromebook and Android phone. > > > > I also tested the feature on Ubuntu on x86-64: > > - With config disabled, vdso/vvar is not sealed, > > - with config enabled, vdso/vvar is sealed, and booting up Ubuntu is OK, > > normal operations such as browsing the web, open/edit doc are OK. > > > > Link: https://lore.kernel.org/all/20240415163527.626541-1-jeffxu@xxxxxxxxxxxx/ [1] > > Link: Documentation/userspace-api/mseal.rst [2] > > Link: https://lore.kernel.org/all/CABi2SkU9BRUnqf70-nksuMCQ+yyiWjo3fM4XkRkL-NrCZxYAyg@xxxxxxxxxxxxxx/ [3] > > Link: https://lore.kernel.org/all/CABi2SkV6JJwJeviDLsq9N4ONvQ=EFANsiWkgiEOjyT9TQSt+HA@xxxxxxxxxxxxxx/ [4] > > Link: https://lore.kernel.org/all/202502251035.239B85A93@keescook/ [5] > > > > ------------------------------------------- > > History: > > > > V9: > > - Add negative test in selftest (Kees Cook) > > - fx typos in text (Kees Cook) > > You have a bad habit of missing stuff off these logs. Usually I don't > comment, as it's trivial, but while we're here :) > > Please try to keep an accurate log of changes requested so you can populate > these properly. > > Obviously this is not going to block anything. But for future reference... > > - Add selftest to main selftest Makefile (Lorenzo Stoakes) > > > > > V8: > > Nit, but no lore link? https://lore.kernel.org/all/20250303050921.3033083-1-jeffxu@xxxxxxxxxx/ Thanks for noticing this. > > > - Change ARCH_SUPPORTS_MSEAL_X to ARCH_SUPPORTS_MSEAL_X (Liam R. Howlett) > > - Update comments in Kconfig and mseal.rst (Lorenzo Stoakes, Liam R. Howlett) > > - Change patch header perfix to "mseal sysmap" (Lorenzo Stoakes) > > - Remove "vm_flags =" (Kees Cook, Liam R. Howlett, Oleg Nesterov) > > - Drop uml architecture (Lorenzo Stoakes, Kees Cook) > > - Add a selftest to verify system mappings are sealed (Lorenzo Stoakes) > > > > V7: > > https://lore.kernel.org/all/20250224225246.3712295-1-jeffxu@xxxxxxxxxx/ > > - Remove cover letter from the first patch (Liam R. Howlett) > > - Change macro name to VM_SEALED_SYSMAP (Liam R. Howlett) > > - logging and fclose() in selftest (Liam R. Howlett) > > > > V6: > > https://lore.kernel.org/all/20250224174513.3600914-1-jeffxu@xxxxxxxxxx/ > > - mseal.rst: fix a typo (Randy Dunlap) > > - security/Kconfig: add rr into note (Liam R. Howlett) > > - remove mseal_system_mappings() and use macro instead (Liam R. Howlett) > > - mseal.rst: add incompatible userland software (Lorenzo Stoakes) > > - remove RFC from title (Kees Cook) > > > > V5 > > https://lore.kernel.org/all/20250212032155.1276806-1-jeffxu@xxxxxxxxxx/ > > - Remove kernel cmd line (Lorenzo Stoakes) > > - Add test info (Lorenzo Stoakes) > > - Add threat model info (Lorenzo Stoakes) > > - Fix x86 selftest: test_mremap_vdso > > - Restrict code change to ARM64/x86-64/UM arch only. > > - Add userprocess.h to include seal_system_mapping(). > > - Remove sealing vsyscall. > > - Split the patch. > > > > V4: > > https://lore.kernel.org/all/20241125202021.3684919-1-jeffxu@xxxxxxxxxx/ > > - ARCH_HAS_SEAL_SYSTEM_MAPPINGS (Lorenzo Stoakes) > > - test info (Lorenzo Stoakes) > > - Update mseal.rst (Liam R. Howlett) > > - Update test_mremap_vdso.c (Liam R. Howlett) > > - Misc. style, comments, doc update (Liam R. Howlett) > > > > V3: > > https://lore.kernel.org/all/20241113191602.3541870-1-jeffxu@xxxxxxxxxx/ > > - Revert uprobe to v1 logic (Oleg Nesterov) > > - use CONFIG_SEAL_SYSTEM_MAPPINGS instead of _ALWAYS/_NEVER (Kees Cook) > > - Move kernel cmd line from fs/exec.c to mm/mseal.c and > > misc. (Liam R. Howlett) > > > > V2: > > https://lore.kernel.org/all/20241014215022.68530-1-jeffxu@xxxxxxxxxx/ > > - Seal uprobe always (Oleg Nesterov) > > - Update comments and description (Randy Dunlap, Liam R.Howlett, Oleg Nesterov) > > - Rebase to linux_main > > > > V1: > > - https://lore.kernel.org/all/20241004163155.3493183-1-jeffxu@xxxxxxxxxx/ > > > > -------------------------------------------------- > > > > > > > > Jeff Xu (7): > > mseal sysmap: kernel config and header change > > selftests: x86: test_mremap_vdso: skip if vdso is msealed > > mseal sysmap: enable x86-64 > > mseal sysmap: enable arm64 > > mseal sysmap: uprobe mapping > > mseal sysmap: update mseal.rst > > selftest: test system mappings are sealed. > > > > Documentation/userspace-api/mseal.rst | 20 +++ > > arch/arm64/Kconfig | 1 + > > arch/arm64/kernel/vdso.c | 12 +- > > arch/x86/Kconfig | 1 + > > arch/x86/entry/vdso/vma.c | 7 +- > > include/linux/mm.h | 10 ++ > > init/Kconfig | 22 ++++ > > kernel/events/uprobes.c | 3 +- > > security/Kconfig | 21 ++++ > > tools/testing/selftests/Makefile | 1 + > > .../mseal_system_mappings/.gitignore | 2 + > > .../selftests/mseal_system_mappings/Makefile | 6 + > > .../selftests/mseal_system_mappings/config | 1 + > > .../mseal_system_mappings/sysmap_is_sealed.c | 119 ++++++++++++++++++ > > .../testing/selftests/x86/test_mremap_vdso.c | 43 +++++++ > > 15 files changed, 261 insertions(+), 8 deletions(-) > > create mode 100644 tools/testing/selftests/mseal_system_mappings/.gitignore > > create mode 100644 tools/testing/selftests/mseal_system_mappings/Makefile > > create mode 100644 tools/testing/selftests/mseal_system_mappings/config > > create mode 100644 tools/testing/selftests/mseal_system_mappings/sysmap_is_sealed.c > > > > -- > > 2.48.1.711.g2feabab25a-goog > >