Re: [PATCH v3 0/6] slab: Introduce dedicated bucket allocator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/24/24 23:40, Kees Cook wrote:
> Hi,
> 
> Series change history:
> 
>  v3:
>   - clarify rationale and purpose in commit log
>   - rebase to -next (CONFIG_CODE_TAGGING)
>   - simplify calling styles and split out bucket plumbing more cleanly
>   - consolidate kmem_buckets_*() family introduction patches
>  v2: https://lore.kernel.org/lkml/20240305100933.it.923-kees@xxxxxxxxxx/
>  v1: https://lore.kernel.org/lkml/20240304184252.work.496-kees@xxxxxxxxxx/
> 
> For the cover letter, I'm repeating commit log for patch 4 here, which has
> additional clarifications and rationale since v2:
> 
>     Dedicated caches are available for fixed size allocations via
>     kmem_cache_alloc(), but for dynamically sized allocations there is only
>     the global kmalloc API's set of buckets available. This means it isn't
>     possible to separate specific sets of dynamically sized allocations into
>     a separate collection of caches.
>     
>     This leads to a use-after-free exploitation weakness in the Linux
>     kernel since many heap memory spraying/grooming attacks depend on using
>     userspace-controllable dynamically sized allocations to collide with
>     fixed size allocations that end up in same cache.
>     
>     While CONFIG_RANDOM_KMALLOC_CACHES provides a probabilistic defense
>     against these kinds of "type confusion" attacks, including for fixed
>     same-size heap objects, we can create a complementary deterministic
>     defense for dynamically sized allocations that are directly user
>     controlled. Addressing these cases is limited in scope, so isolation these
>     kinds of interfaces will not become an unbounded game of whack-a-mole. For
>     example, pass through memdup_user(), making isolation there very
>     effective.

What does "Addressing these cases is limited in scope, so isolation
these kinds of interfaces will not become an unbounded game of
whack-a-mole." mean exactly?

>     
>     In order to isolate user-controllable sized allocations from system
>     allocations, introduce kmem_buckets_create(), which behaves like
>     kmem_cache_create(). Introduce kmem_buckets_alloc(), which behaves like
>     kmem_cache_alloc(). Introduce kmem_buckets_alloc_track_caller() for
>     where caller tracking is needed. Introduce kmem_buckets_valloc() for
>     cases where vmalloc callback is needed.
>     
>     Allows for confining allocations to a dedicated set of sized caches
>     (which have the same layout as the kmalloc caches).
>     
>     This can also be used in the future to extend codetag allocation
>     annotations to implement per-caller allocation cache isolation[1] even
>     for dynamic allocations.
Having per-caller allocation cache isolation looks like something that
has already been done in
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c6152940584290668b35fa0800026f6a1ae05fe
albeit in a randomized way. Why not piggy-back on the infra added by
this patch, instead of adding a new API?

>     Memory allocation pinning[2] is still needed to plug the Use-After-Free
>     cross-allocator weakness, but that is an existing and separate issue
>     which is complementary to this improvement. Development continues for
>     that feature via the SLAB_VIRTUAL[3] series (which could also provide
>     guard pages -- another complementary improvement).
>     
>     Link: https://lore.kernel.org/lkml/202402211449.401382D2AF@keescook [1]
>     Link: https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html [2]
>     Link: https://lore.kernel.org/lkml/20230915105933.495735-1-matteorizzo@xxxxxxxxxx/ [3]

To be honest, I think this series is close to useless without allocation
pinning. And even with pinning, it's still routinely bypassed in the
KernelCTF
(https://github.com/google/security-research/tree/master/pocs/linux/kernelctf).

Do you have some particular exploits in mind that would be completely
mitigated by your series?

Moreover, I'm not aware of any ongoing development of the SLAB_VIRTUAL
series: the last sign of life on its thread is from 7 months ago.

> 
> After the core implementation are 2 patches that cover the most heavily
> abused "repeat offenders" used in exploits. Repeating those details here:
> 
>     The msg subsystem is a common target for exploiting[1][2][3][4][5][6]
>     use-after-free type confusion flaws in the kernel for both read and
>     write primitives. Avoid having a user-controlled size cache share the
>     global kmalloc allocator by using a separate set of kmalloc buckets.
>     
>     Link: https://blog.hacktivesecurity.com/index.php/2022/06/13/linux-kernel-exploit-development-1day-case-study/ [1]
>     Link: https://hardenedvault.net/blog/2022-11-13-msg_msg-recon-mitigation-ved/ [2]
>     Link: https://www.willsroot.io/2021/08/corctf-2021-fire-of-salvation-writeup.html [3]
>     Link: https://a13xp0p0v.github.io/2021/02/09/CVE-2021-26708.html [4]
>     Link: https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html [5]
>     Link: https://zplin.me/papers/ELOISE.pdf [6]
>     Link: https://syst3mfailure.io/wall-of-perdition/ [7]
> 
>     Both memdup_user() and vmemdup_user() handle allocations that are
>     regularly used for exploiting use-after-free type confusion flaws in
>     the kernel (e.g. prctl() PR_SET_VMA_ANON_NAME[1] and setxattr[2][3][4]
>     respectively).
>     
>     Since both are designed for contents coming from userspace, it allows
>     for userspace-controlled allocation sizes. Use a dedicated set of kmalloc
>     buckets so these allocations do not share caches with the global kmalloc
>     buckets.
>     
>     Link: https://starlabs.sg/blog/2023/07-prctl-anon_vma_name-an-amusing-heap-spray/ [1]
>     Link: https://duasynt.com/blog/linux-kernel-heap-spray [2]
>     Link: https://etenal.me/archives/1336 [3]
>     Link: https://github.com/a13xp0p0v/kernel-hack-drill/blob/master/drill_exploit_uaf.c [4]

What's the performance impact of this series? Did you run some benchmarks?

> 
> Thanks!
> 
> -Kees
> 
> 
> Kees Cook (6):
>   mm/slab: Introduce kmem_buckets typedef
>   mm/slab: Plumb kmem_buckets into __do_kmalloc_node()
>   mm/slab: Introduce __kvmalloc_node() that can take kmem_buckets
>     argument
>   mm/slab: Introduce kmem_buckets_create() and family
>   ipc, msg: Use dedicated slab buckets for alloc_msg()
>   mm/util: Use dedicated slab buckets for memdup_user()
> 
>  include/linux/slab.h | 44 ++++++++++++++++--------
>  ipc/msgutil.c        | 13 +++++++-
>  lib/fortify_kunit.c  |  2 +-
>  lib/rhashtable.c     |  2 +-
>  mm/slab.h            |  6 ++--
>  mm/slab_common.c     | 79 +++++++++++++++++++++++++++++++++++++++++---
>  mm/slub.c            | 14 ++++----
>  mm/util.c            | 21 +++++++++---
>  8 files changed, 146 insertions(+), 35 deletions(-)
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux