On 4/24/24 23:40, Kees Cook wrote: > Hi, > > Series change history: > > v3: > - clarify rationale and purpose in commit log > - rebase to -next (CONFIG_CODE_TAGGING) > - simplify calling styles and split out bucket plumbing more cleanly > - consolidate kmem_buckets_*() family introduction patches > v2: https://lore.kernel.org/lkml/20240305100933.it.923-kees@xxxxxxxxxx/ > v1: https://lore.kernel.org/lkml/20240304184252.work.496-kees@xxxxxxxxxx/ > > For the cover letter, I'm repeating commit log for patch 4 here, which has > additional clarifications and rationale since v2: > > Dedicated caches are available for fixed size allocations via > kmem_cache_alloc(), but for dynamically sized allocations there is only > the global kmalloc API's set of buckets available. This means it isn't > possible to separate specific sets of dynamically sized allocations into > a separate collection of caches. > > This leads to a use-after-free exploitation weakness in the Linux > kernel since many heap memory spraying/grooming attacks depend on using > userspace-controllable dynamically sized allocations to collide with > fixed size allocations that end up in same cache. > > While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense > against these kinds of "type confusion" attacks, including for fixed > same-size heap objects, we can create a complementary deterministic > defense for dynamically sized allocations that are directly user > controlled. Addressing these cases is limited in scope, so isolation these > kinds of interfaces will not become an unbounded game of whack-a-mole. For > example, pass through memdup_user(), making isolation there very > effective. What does "Addressing these cases is limited in scope, so isolation these kinds of interfaces will not become an unbounded game of whack-a-mole." mean exactly? > > In order to isolate user-controllable sized allocations from system > allocations, introduce kmem_buckets_create(), which behaves like > kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like > kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for > where caller tracking is needed. Introduce kmem_buckets_valloc() for > cases where vmalloc callback is needed. > > Allows for confining allocations to a dedicated set of sized caches > (which have the same layout as the kmalloc caches). > > This can also be used in the future to extend codetag allocation > annotations to implement per-caller allocation cache isolation[1] even > for dynamic allocations. Having per-caller allocation cache isolation looks like something that has already been done in https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c6152940584290668b35fa0800026f6a1ae05fe albeit in a randomized way. Why not piggy-back on the infra added by this patch, instead of adding a new API? > Memory allocation pinning[2] is still needed to plug the Use-After-Free > cross-allocator weakness, but that is an existing and separate issue > which is complementary to this improvement. Development continues for > that feature via the SLAB_VIRTUAL[3] series (which could also provide > guard pages -- another complementary improvement). > > Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1] > Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2] > Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@xxxxxxxxxx/ [3] To be honest, I think this series is close to useless without allocation pinning. And even with pinning, it's still routinely bypassed in the KernelCTF (https://github.com/google/security-research/tree/master/pocs/linux/kernelctf). Do you have some particular exploits in mind that would be completely mitigated by your series? Moreover, I'm not aware of any ongoing development of the SLAB_VIRTUAL series: the last sign of life on its thread is from 7 months ago. > > After the core implementation are 2 patches that cover the most heavily > abused "repeat offenders" used in exploits. Repeating those details here: > > The msg subsystem is a common target for exploiting[1][2][3][4][5][6] > use-after-free type confusion flaws in the kernel for both read and > write primitives. Avoid having a user-controlled size cache share the > global kmalloc allocator by using a separate set of kmalloc buckets. > > Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1] > Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2] > Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3] > Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4] > Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5] > Link: https://zplin.me/papers/ELOISE.pdf [6] > Link: https://syst3mfailure.io/wall-of-perdition/ [7] > > Both memdup_user() and vmemdup_user() handle allocations that are > regularly used for exploiting use-after-free type confusion flaws in > the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4] > respectively). > > Since both are designed for contents coming from userspace, it allows > for userspace-controlled allocation sizes. Use a dedicated set of kmalloc > buckets so these allocations do not share caches with the global kmalloc > buckets. > > Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1] > Link: https://duasynt.com/blog/linux-kernel-heap-spray [2] > Link: https://etenal.me/archives/1336 [3] > Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4] What's the performance impact of this series? Did you run some benchmarks? > > Thanks! > > -Kees > > > Kees Cook (6): > mm/slab: Introduce kmem_buckets typedef > mm/slab: Plumb kmem_buckets into __do_kmalloc_node() > mm/slab: Introduce __kvmalloc_node() that can take kmem_buckets > argument > mm/slab: Introduce kmem_buckets_create() and family > ipc, msg: Use dedicated slab buckets for alloc_msg() > mm/util: Use dedicated slab buckets for memdup_user() > > include/linux/slab.h | 44 ++++++++++++++++-------- > ipc/msgutil.c | 13 +++++++- > lib/fortify_kunit.c | 2 +- > lib/rhashtable.c | 2 +- > mm/slab.h | 6 ++-- > mm/slab_common.c | 79 +++++++++++++++++++++++++++++++++++++++++--- > mm/slub.c | 14 ++++---- > mm/util.c | 21 +++++++++--- > 8 files changed, 146 insertions(+), 35 deletions(-) >