On Wed, Aug 31, 2022 at 05:59:42PM +0200, Pablo Mendez Hernandez wrote: > Adding Daniel for awareness. Why was the heavyweight rather than lightweight configuration used? Why compare with all the expensive optional security features enabled? Even the lightweight configuration has 2 of the optional security features enabled: slab canaries and full zero-on-free. Both of those should be disabled to measure the baseline performance. Using the heavyweight configuration means having large slab allocation quarantines and not just zero-on-free but checking that data is still zeroed on allocation (which more than doubles the cost), slot randomization and multiple other features. It just doesn't make sense to turn security up to 11 with optional features and then present that as if it's the performance offered. I'm here to provide clarifications about my project and to counter incorrect beliefs about it. I don't think it makes much sense for Fedora to use it as a default allocator but the claims being made about memory usage and performance are very wrong. I already responded and provided both concise and detailed explanations. I don't know what these nonsense measurements completely disregarding all that are meant to demonstrate. It's a huge hassle for me to respond here because I have no interest in this list and don't want to be subscribed to it. I didn't propose that Fedora uses it and don't think it makes sense for Fedora. At the same time I already explained that glibc malloc is ALSO a very bad choice in detail. Linux distributions not willing to sacrifice much for security would be better served by using jemalloc with small chunk sizes on 64 bit operating systems. ASLR is too low entropy on 32 bit to afford the sacrifice of a few bits for chunk alignment though. It can be configured with extra sanity checks enabled and with certain very non-essential features disabled to provide a better balance of security vs. performance. The defaults are optimized for long running server processes. It's very configurable, including by individual applications. hardened_malloc builds both a lightweight and heavyweight library itself. The lightweight library still has the optional slab allocation canary and full zero-on-free features enabled. Both those should be disabled to truly measure the baseline cost. None of those optional features is provided by glibc malloc. None of them is needed to get the benefits of hardened_malloc's 100% out-of-line metadata, 100% invalid free detection, entirely separate never reused address space regions for all allocator metadata and each slab allocation size class (which covers up to 128k by default), virtual memory quarantines + random guards for large allocations, etc. etc. The optional security features are optional because they're expensive. That's the point of building both a sample lightweight and heavyweight configuration by default. Lightweight configuration is essentially the recommended configuration if you aren't willing to make more significant sacrifices for security. It's not the highest performance configuration it offers, just a reasonable compromise. Slab allocation canaries slightly increase memory usage. Slab allocation quarantines (disabled in lightweight configuration, which is built by default) greatly increase memory usage, especially with the default configuration. The whole point of quarantines is that they delay reuse of the memory and since these are slab allocations within slabs the memory gets held onto. If you wanted to do measure the baseline performance, then you'd do as I suggested and measure with all the optional features disabled (disable at least those 2 features included in optional) and compare that to both glibc malloc and glibc malloc with tcache disabled. I explained previously that hardened_malloc could provide an array-based thread cache as an opt-in feature, but currently it isn't done because it inherently reduces security. No more 100% reliable detection of all invalid frees and a lot more security properties lost. Also hardly makes sense to have optional features like quarantines and slot randomization underneath unless the thread caches are doing the same thing. As I said previously, if you compare hardened_malloc with optional features disabled to glibc malloc with tcache disabled, it performs as well and has much lower fragmentation and lower metadata overhead. If you stick a small array-based thread cache onto hardened_malloc, then it can perform as well as glibc with much larger freelist-based thread caches since it has a different approach to scaling with jemalloc-style arenas. _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue