Re: hardened malloc is big and slow

Daniel Micay via devel <devel@xxxxxxxxxxxxxxxxxxxxxxx> · Tue, 6 Sep 2022 00:02:02 -0400

On Wed, Aug 31, 2022 at 05:59:42PM +0200, Pablo Mendez Hernandez wrote:
> Adding Daniel for awareness.

Why was the heavyweight rather than lightweight configuration used? Why
compare with all the expensive optional security features enabled? Even
the lightweight configuration has 2 of the optional security features
enabled: slab canaries and full zero-on-free. Both of those should be
disabled to measure the baseline performance. Using the heavyweight
configuration means having large slab allocation quarantines and not
just zero-on-free but checking that data is still zeroed on allocation
(which more than doubles the cost), slot randomization and multiple
other features. It just doesn't make sense to turn security up to 11
with optional features and then present that as if it's the performance
offered.

I'm here to provide clarifications about my project and to counter
incorrect beliefs about it. I don't think it makes much sense for Fedora
to use it as a default allocator but the claims being made about memory
usage and performance are very wrong. I already responded and provided
both concise and detailed explanations. I don't know what these nonsense
measurements completely disregarding all that are meant to demonstrate.

It's a huge hassle for me to respond here because I have no interest in
this list and don't want to be subscribed to it. I didn't propose that
Fedora uses it and don't think it makes sense for Fedora. At the same
time I already explained that glibc malloc is ALSO a very bad choice in
detail. Linux distributions not willing to sacrifice much for security
would be better served by using jemalloc with small chunk sizes on 64
bit operating systems. ASLR is too low entropy on 32 bit to afford the
sacrifice of a few bits for chunk alignment though. It can be configured
with extra sanity checks enabled and with certain very non-essential
features disabled to provide a better balance of security vs.
performance. The defaults are optimized for long running server
processes. It's very configurable, including by individual applications.

hardened_malloc builds both a lightweight and heavyweight library
itself. The lightweight library still has the optional slab allocation
canary and full zero-on-free features enabled. Both those should be
disabled to truly measure the baseline cost. None of those optional
features is provided by glibc malloc. None of them is needed to get the
benefits of hardened_malloc's 100% out-of-line metadata, 100% invalid
free detection, entirely separate never reused address space regions for
all allocator metadata and each slab allocation size class (which covers
up to 128k by default), virtual memory quarantines + random guards for
large allocations, etc. etc.

The optional security features are optional because they're expensive.
That's the point of building both a sample lightweight and heavyweight
configuration by default. Lightweight configuration is essentially the
recommended configuration if you aren't willing to make more significant
sacrifices for security. It's not the highest performance configuration
it offers, just a reasonable compromise.

Slab allocation canaries slightly increase memory usage. Slab allocation
quarantines (disabled in lightweight configuration, which is built by
default) greatly increase memory usage, especially with the default
configuration. The whole point of quarantines is that they delay reuse
of the memory and since these are slab allocations within slabs the
memory gets held onto.

If you wanted to do measure the baseline performance, then you'd do as I
suggested and measure with all the optional features disabled (disable
at least those 2 features included in optional) and compare that to both
glibc malloc and glibc malloc with tcache disabled.

I explained previously that hardened_malloc could provide an array-based
thread cache as an opt-in feature, but currently it isn't done because
it inherently reduces security. No more 100% reliable detection of all
invalid frees and a lot more security properties lost. Also hardly makes
sense to have optional features like quarantines and slot randomization
underneath unless the thread caches are doing the same thing.

As I said previously, if you compare hardened_malloc with optional
features disabled to glibc malloc with tcache disabled, it performs as
well and has much lower fragmentation and lower metadata overhead. If
you stick a small array-based thread cache onto hardened_malloc, then it
can perform as well as glibc with much larger freelist-based thread
caches since it has a different approach to scaling with jemalloc-style
arenas.
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue