Re: hardened memory allocate port to linux-fedora system for secutiry

Daniel Micay via devel <devel@xxxxxxxxxxxxxxxxxxxxxxx> · Fri, 26 Aug 2022 12:22:14 -0400

On Mon, Aug 15, 2022 at 07:39:46PM -0700, John Reiser wrote:
> On 8/13/22, Demi Marie Obenour wrote:
> > On 8/13/22, Kevin Kofler via devel wrote:
> > > martin luther wrote:
> > > > should we implement https://github.com/GrapheneOS/hardened_malloc/
> > > > it is hardened memory allocate it will increase the security of fedora
> > > > according to the graphene os team it can be ported to linux as well need
> > > > to look at it
> > 
> > CCing Daniel Micay who wrote hardened_malloc.
> > 
> > > There are several questions that come up:  [[snip]]
> 
> It seems to me that hardened_malloc could increase working set and RAM
> desired by something like 10% compared to glibc for some important workloads,
> such as Fedora re-builds.  From page 22 of [1] (attached here; 203KB), the graph
> of number of requests versus requested size shows that blocks of size <= 128
> were requested tens to thousands of times more often than all the rest.

It has far less fragmentation than glibc malloc. It also has far lower
metadata overhead since there are no headers on allocations and only a
few bits consumed per small allocation. glibc has over 100% metadata
overhead for 16 byte allocations while for hardened_malloc it's a very
low percentage. Of course, you need to compare with slab allocation
quarantines and slab allocation canaries disabled in hardened_malloc.

> For sizes from 0 through 128, the "Size classes" section of README.md of [2]
> documents worst-case internal fragmentation (in "slabs") of 93.75% to 11.72%.
> That seems too high.  Where are actual measurements for workloads such as
> Fedora re-builds?

Internal malloc means fragmentation caused by size class rounding. There
is no way to have size classes that aren't multiples of 16 due to it
being required by the x86_64 and arm64 ABI. glibc has over 100% overhead
for 16 byte allocations due to header metadata and other metadata. It
definitely isn't lighter for those compared to a modern slab allocator.

There's a 16 byte alignment requirement for malloc on x86_64 and arm64
so there's no way to have any size classes between the initial multiples
of 16.

Slab allocation canaries are an optional hardened_malloc feature adding
8 byte random canaries to the end of allocations, which in many cases
will increase the size class if there isn't room within the padding.
Slab allocation quarantines are another optional feature which require
dedicating substantial memory to avoiding reuse of allocations.

You should compare without the optional features enabled as a baseline
because glibc doesn't have any of those security features, and the
baseline hardened_malloc design is far more secure.

> (Also note that the important special case of malloc(0), which is analogous
> to (gensym) of Lisp and is implemented internally as malloc(1), consumes
> 16 bytes and has a fragmentation of 93.75% for both glibc and hardened_malloc.
> The worst fragmentation happens for *every* call to malloc(0), which occurred
> about 800,000 times in the sample.  Yikes!)

malloc(0) is not implemented as malloc(1) in hardened_malloc and does
not use any memory for the data, only the metadata, which is a small
percentage of the allocation size even for 16 byte allocations since
there is only slab metadata for the entire slab and bitmaps to track
which slots are used. There are no allocation headers.

Doing hundreds of thousands of malloc(0) allocations only uses a few
bytes of memory in hardened_malloc. Each allocation requires a bit in
the bitmap and each slab of 256x 16 byte allocations (4096 byte slab)
has slab metadata. All the metadata is in a dedicated metadata region.

I strong recommend reading all the documentation thoroughly:

https://github.com/GrapheneOS/hardened_malloc/blob/main/README.md

hardened_malloc is oriented towards security and provides a bunch of
important security properties unavailable with glibc malloc. It also has
lower fragmentation and with the optional security features disabled
also lower memory usage for large processes and especially over time. If
you enable the slab quarantines, that's going to use a lot of memory. If
you enable slab canaries, you give up some of the memory usage reduction
from not having per-allocation metadata headers. Neither of those
features exists in glibc malloc, jemalloc, etc. so it's not really fair
to enable the optional security features for hardened_malloc and compare
with allocators without them.

Slab allocation quarantines in particular inherently require a ton of
memory in order to delay reuse of allocations for as long of a time as
is feasible. This pairs well with zero-on-free + write-after-free-check
based on zero-on-free, since if any non-zero write occurs while
quarantined/freed it will be detected before the allocation is reused.
As long as zero-on-free is enabled, which it is even for the sample
light configuration, then all memory is known to be zeroed at allocation
time, which is how the write-after-free-check works. All of these things
are supplementary optional features, NOT the core security features. The
core security features are the baseline design of not having any inline
metadata, having entirely separate fully statically reserved address
space regions each with their own high entropy random base for all
allocation metadata (1 region) and each size class (each has a separate
region), absolutely never reusing address space between the regions,
etc. It provides very substantial security benefits over a completely
non-hardened allocator with a legacy easy to exploit design such as
glibc malloc and only a few mostly non-working sanity checks.

There are other approaches which take a middle ground, but
hardened_malloc is focused on security first, with very low
fragmentation (dramatically lower than glibc) and also lower memory
usage for large processes when slab allocation quarantines are disabled
and especially when slab canaries are also disabled. Try using
hardened_malloc for something like the Matrix Synapse server with the
light configuration (slab allocation quarantine not used) and compare to
glibc malloc. It uses far less memory and unlike glibc malloc doesn't
end up 'leaking' tons of memory over time from fragmentation. Disable
slab canaries and try again, it will be even lower, although probably
not particularly noticeably.

If you choose to use the very memory expensive slab quarantine feature
which is not enabled in the standard light configuration, that's your
choice.

Also, you hardened_malloc doesn't use a thread cache for security
reasons. It invalidates many of the security properties. If you compare
to glibc malloc in the light configuration with tcache disabled in glibc
malloc it will compare well, and hardened_malloc can scale better when
given enough arenas. If you want to make the substantial security
sacrifices required for a traditional thread cache, then I don't think
hardened_malloc makes sense, which is why it doesn't include the option
to do thread caching even though it'd be easy to implement. It may one
day include the option to do thread batched allocation, but it isn't
feasible to do it for deallocation without losing a ton of the strong
security properties.
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue