This series aims at hardening struct cred using the kpkeys infrastructure proposed in [1]. The idea is to enforce the immutability of live credentials (task->{creds,read_creds}) by allocating them in "protected" memory, which cannot be written to in the default pkey configuration (kpkeys level). Code that genuinely requires writing to live credentials, such as get_cred(), explicitly switches to a privileged kpkeys level, enabling write access to the protected mapping. The main challenge with this approach is to minimise the disruption to existing code. Directly allocating credentials in protected memory would force any code setting up credentials to switch kpkeys level. Instead, we use the fact that commit_creds() "eats" the caller's reference, meaning that the caller cannot use that reference after calling commit_creds(). This allows us to move the credentials to a new location in commit_creds(): prepare_creds() still allocates them in regular memory, and commit_creds() moves them to protected memory (that is memory mapped with a non-default pkey). This ensures that *live* credentials are protected, without affecting users of commit_creds(). The situation isn't as simple with override_creds(), as the caller may (and often does) keep using the reference it passed. In this case, the caller should explicitly call a new helper, protect_creds(), to move the credentials to protected memory. This seems to be the most robust approach, and the number of call sites to amend looks reasonable (patch 7 covers the most important ones). No failure should occur if a call site is missed; the credentials will simply be unprotected. In order to allocate credentials in protected memory, this series introduces support for mapping slabs with a non-default pkey, using the SLAB_SET_PKEY kmem_cache_create() flag (patch 3). The complexity is kept minimal by setting the pkey at the slab level; it should also be possible to do this at the page level, but it isn't immediately obvious where the pkey value could be stored in struct page - especially since we've almost run out of GFP flags. Most of the cover letter for the original kpkeys series [1] is relevant to this series as well. In particular, the threat model is unchanged: the aim is to mitigate data-only attacks, such as use-after-free. It is therefore assumed that control flow is not corrupted, and that the attacker does not achieve arbitrary code execution. The most significant caveat in this RFC is RCU handling. Storing struct cred in memory that is read-only by default would break RCU without special handling, as it needs to write to cred->rcu (to zero out the callback field, for instance). There is currently no efficient way for RCU to know whether the object to be freed is protected or not, and executing the whole of RCU at a higher kpkeys level would imply running RCU callbacks at that level too, which isn't ideal (a callback could be exploited to write to protected locations). The current approach (patch 4) therefore switches kpkeys level whenever any struct rcu_head is written to. This is safe, but the overhead is likely to be unacceptably large. Ideally, RCU would be able to tell if a struct rcu_head resides in protected memory, maybe using a flag - it isn't clear where that flag could be stored though. Other performance-related caveats: * In many cases, the use of guard objects to obtain write access to protected data is nested: a function holding a guard calls another that will also create a guard object. This seems difficult to avoid without heavy refactoring. With the assumption that writing to the pkey register is expensive (which is the case at least on arm64/POE), patch 1 mitigates the cost by skipping the setting/restoring of the register if the new value is equal to the current one, as is the case when guards are nested. * Because a struct cred may be freed before being ever installed, put_cred_rcu() may be operating on an object that is located either in regular or protected memory. This is handled by looking up the slab containing the object and checking if its flags include SLAB_SET_PKEY. The overhead is hopefully acceptable on that path, but the approach is not particularly elegant. * Similarly, put_cred(), get_cred() and other helpers may be called on unprotected objects. Those helpers however create a guard object unconditionally if they need to write to the credentials. It is unclear whether skipping the guard for unprotected objects would give a performance uplift, as this depends on the cost of checking if an object is protected or not. * It is assumed that calling arch_kpkeys_enabled() is cheap, as multiple guards are conditional on that function. (This boils down to a static branch on arm64, which should indeed be cheap.) This series applies on top of v6.14-rc1 + the kpkeys RFC v3 [1] + a cleanup patch for SLAB flags [2]. A next step will be to estimate the performance impact of the kpkeys-based features (page table and struct cred hardening); no benchmarking has been performed at this stage. Any comment or feedback will be highly appreciated, be it on the high-level approach or implementation choices! - Kevin [1] https://lore.kernel.org/linux-hardening/20250203101839.1223008-1-kevin.brodsky@xxxxxxx/ [2] https://lore.kernel.org/lkml/20250124164858.756425-1-kevin.brodsky@xxxxxxx/ --- Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Cc: Mark Brown <broonie@xxxxxxxxxx> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: David Howells <dhowells@xxxxxxxxxx> Cc: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> Cc: Jann Horn <jannh@xxxxxxxxxx> Cc: Jeff Xu <jeffxu@xxxxxxxxxxxx> Cc: Joey Gouly <joey.gouly@xxxxxxx> Cc: Kees Cook <kees@xxxxxxxxxx> Cc: Linus Walleij <linus.walleij@xxxxxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxx> Cc: Marc Zyngier <maz@xxxxxxxxxx> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> Cc: Pierre Langlois <pierre.langlois@xxxxxxx> Cc: Quentin Perret <qperret@xxxxxxxxxx> Cc: "Mike Rapoport (IBM)" <rppt@xxxxxxxxxx> Cc: Ryan Roberts <ryan.roberts@xxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx> Cc: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx Cc: linux-mm@xxxxxxxxx Cc: x86@xxxxxxxxxx Kevin Brodsky (8): arm64: kpkeys: Avoid unnecessary writes to POR_EL1 mm: kpkeys: Introduce unrestricted level slab: Introduce SLAB_SET_PKEY rcu: Allow processing kpkeys-protected data mm: kpkeys: Introduce cred pkey/level cred: Protect live struct cred with kpkeys fs: Protect creds installed by override_creds() mm: Add basic tests for kpkeys_hardened_cred arch/arm64/include/asm/kpkeys.h | 14 ++- fs/aio.c | 2 +- fs/fuse/passthrough.c | 2 +- fs/nfs/nfs4idmap.c | 2 +- fs/nfsd/auth.c | 2 +- fs/nfsd/nfs4recover.c | 2 +- fs/nfsd/nfsfh.c | 2 +- fs/open.c | 2 +- fs/overlayfs/dir.c | 2 +- fs/overlayfs/super.c | 2 +- include/asm-generic/kpkeys.h | 4 + include/linux/cred.h | 6 ++ include/linux/kpkeys.h | 16 ++- include/linux/slab.h | 21 ++++ kernel/cred.c | 178 +++++++++++++++++++++++++++----- kernel/rcu/rcu_segcblist.c | 10 +- kernel/rcu/tree.c | 4 +- mm/Kconfig | 2 + mm/Makefile | 1 + mm/kpkeys_hardened_cred_test.c | 42 ++++++++ mm/slab.h | 7 +- mm/slab_common.c | 2 +- mm/slub.c | 58 ++++++++++- security/Kconfig.hardening | 24 +++++ 24 files changed, 361 insertions(+), 46 deletions(-) create mode 100644 mm/kpkeys_hardened_cred_test.c -- 2.47.0