On Mon, Aug 15, 2022 at 07:39:46PM -0700, John Reiser wrote: > On 8/13/22, Demi Marie Obenour wrote: > > On 8/13/22, Kevin Kofler via devel wrote: > > > martin luther wrote: > > > > should we implement https://github.com/GrapheneOS/hardened_malloc/ > > > > it is hardened memory allocate it will increase the security of fedora > > > > according to the graphene os team it can be ported to linux as well need > > > > to look at it > > > > CCing Daniel Micay who wrote hardened_malloc. > > > > > There are several questions that come up: [[snip]] > > It seems to me that hardened_malloc could increase working set and RAM > desired by something like 10% compared to glibc for some important workloads, > such as Fedora re-builds. From page 22 of [1] (attached here; 203KB), the graph > of number of requests versus requested size shows that blocks of size <= 128 > were requested tens to thousands of times more often than all the rest. It has far less fragmentation than glibc malloc. It also has far lower metadata overhead since there are no headers on allocations and only a few bits consumed per small allocation. glibc has over 100% metadata overhead for 16 byte allocations while for hardened_malloc it's a very low percentage. Of course, you need to compare with slab allocation quarantines and slab allocation canaries disabled in hardened_malloc. > For sizes from 0 through 128, the "Size classes" section of README.md of [2] > documents worst-case internal fragmentation (in "slabs") of 93.75% to 11.72%. > That seems too high. Where are actual measurements for workloads such as > Fedora re-builds? Internal malloc means fragmentation caused by size class rounding. There is no way to have size classes that aren't multiples of 16 due to it being required by the x86_64 and arm64 ABI. glibc has over 100% overhead for 16 byte allocations due to header metadata and other metadata. It definitely isn't lighter for those compared to a modern slab allocator. There's a 16 byte alignment requirement for malloc on x86_64 and arm64 so there's no way to have any size classes between the initial multiples of 16. Slab allocation canaries are an optional hardened_malloc feature adding 8 byte random canaries to the end of allocations, which in many cases will increase the size class if there isn't room within the padding. Slab allocation quarantines are another optional feature which require dedicating substantial memory to avoiding reuse of allocations. You should compare without the optional features enabled as a baseline because glibc doesn't have any of those security features, and the baseline hardened_malloc design is far more secure. > (Also note that the important special case of malloc(0), which is analogous > to (gensym) of Lisp and is implemented internally as malloc(1), consumes > 16 bytes and has a fragmentation of 93.75% for both glibc and hardened_malloc. > The worst fragmentation happens for *every* call to malloc(0), which occurred > about 800,000 times in the sample. Yikes!) malloc(0) is not implemented as malloc(1) in hardened_malloc and does not use any memory for the data, only the metadata, which is a small percentage of the allocation size even for 16 byte allocations since there is only slab metadata for the entire slab and bitmaps to track which slots are used. There are no allocation headers. Doing hundreds of thousands of malloc(0) allocations only uses a few bytes of memory in hardened_malloc. Each allocation requires a bit in the bitmap and each slab of 256x 16 byte allocations (4096 byte slab) has slab metadata. All the metadata is in a dedicated metadata region. I strong recommend reading all the documentation thoroughly: https://github.com/GrapheneOS/hardened_malloc/blob/main/README.md hardened_malloc is oriented towards security and provides a bunch of important security properties unavailable with glibc malloc. It also has lower fragmentation and with the optional security features disabled also lower memory usage for large processes and especially over time. If you enable the slab quarantines, that's going to use a lot of memory. If you enable slab canaries, you give up some of the memory usage reduction from not having per-allocation metadata headers. Neither of those features exists in glibc malloc, jemalloc, etc. so it's not really fair to enable the optional security features for hardened_malloc and compare with allocators without them. Slab allocation quarantines in particular inherently require a ton of memory in order to delay reuse of allocations for as long of a time as is feasible. This pairs well with zero-on-free + write-after-free-check based on zero-on-free, since if any non-zero write occurs while quarantined/freed it will be detected before the allocation is reused. As long as zero-on-free is enabled, which it is even for the sample light configuration, then all memory is known to be zeroed at allocation time, which is how the write-after-free-check works. All of these things are supplementary optional features, NOT the core security features. The core security features are the baseline design of not having any inline metadata, having entirely separate fully statically reserved address space regions each with their own high entropy random base for all allocation metadata (1 region) and each size class (each has a separate region), absolutely never reusing address space between the regions, etc. It provides very substantial security benefits over a completely non-hardened allocator with a legacy easy to exploit design such as glibc malloc and only a few mostly non-working sanity checks. There are other approaches which take a middle ground, but hardened_malloc is focused on security first, with very low fragmentation (dramatically lower than glibc) and also lower memory usage for large processes when slab allocation quarantines are disabled and especially when slab canaries are also disabled. Try using hardened_malloc for something like the Matrix Synapse server with the light configuration (slab allocation quarantine not used) and compare to glibc malloc. It uses far less memory and unlike glibc malloc doesn't end up 'leaking' tons of memory over time from fragmentation. Disable slab canaries and try again, it will be even lower, although probably not particularly noticeably. If you choose to use the very memory expensive slab quarantine feature which is not enabled in the standard light configuration, that's your choice. Also, you hardened_malloc doesn't use a thread cache for security reasons. It invalidates many of the security properties. If you compare to glibc malloc in the light configuration with tcache disabled in glibc malloc it will compare well, and hardened_malloc can scale better when given enough arenas. If you want to make the substantial security sacrifices required for a traditional thread cache, then I don't think hardened_malloc makes sense, which is why it doesn't include the option to do thread caching even though it'd be easy to implement. It may one day include the option to do thread batched allocation, but it isn't feasible to do it for deallocation without losing a ton of the strong security properties. _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue