On Sat, Jul 15, 2017 at 10:23:14PM -0400, Dennis Zhou wrote: > From: "Dennis Zhou (Facebook)" <dennisszhou@xxxxxxxxx> > > The percpu memory allocator is experiencing scalability issues when > allocating and freeing large numbers of counters as in BPF. > Additionally, there is a corner case where iteration is triggered over > all chunks if the contig_hint is the right size, but wrong alignment. > > Implementation: > This patch removes the area map allocator in favor of a bitmap allocator > backed by metadata blocks. The primary goal is to provide consistency > in performance and memory footprint with a focus on small allocations > (< 64 bytes). The bitmap removes the heavy memmove from the freeing > critical path and provides a consistent memory footprint. The metadata > blocks provide a bound on the amount of scanning required by maintaining > a set of hints. > > The chunks previously were managed by free_size, a value maintained in > bytes. Now, the chunks are managed in terms of bits, which is just a > scaled value of free_size down by PCPU_MIN_ALLOC_SIZE. > > There is one caveat with this implementation. In an effort to make > freeing fast, the only time metadata is updated on the free path is if a > whole block becomes free or the freed area spans across metadata blocks. > This causes the chunk’s contig_hint to be potentially smaller than what > it could allocate by up to a block. If the chunk’s contig_hint is > smaller than a block, a check occurs and the hint is kept accurate. > Metadata is always kept accurate on allocation and therefore the > situation where a chunk has a larger contig_hint than available will > never occur. > > Evaluation: > I have primarily done testing against a simple workload of allocation of > 1 million objects of varying size. Deallocation was done by in order, > alternating, and in reverse. These numbers were collected after rebasing > ontop of a80099a152. I present the worst-case numbers here: > > Area Map Allocator: > > Object Size | Alloc Time (ms) | Free Time (ms) > ---------------------------------------------- > 4B | 335 | 4960 > 16B | 485 | 1150 > 64B | 445 | 280 > 128B | 505 | 177 > 1024B | 3385 | 140 > > Bitmap Allocator: > > Object Size | Alloc Time (ms) | Free Time (ms) > ---------------------------------------------- > 4B | 725 | 70 > 16B | 760 | 70 > 64B | 855 | 80 > 128B | 910 | 90 > 1024B | 3770 | 260 > > This data demonstrates the inability for the area map allocator to > handle less than ideal situations. In the best case of reverse > deallocation, the area map allocator was able to perform within range > of the bitmap allocator. In the worst case situation, freeing took > nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator > dramatically improves the consistency of the free path. The small > allocations performed nearly identical regardless of the freeing > pattern. > > While it does add to the allocation latency, the allocation scenario > here is optimal for the area map allocator. The second problem of > additional scanning can result in the area map allocator completing in > 52 minutes. The same workload takes only 14 seconds to complete for the > bitmap allocator. This was produced under a more contrived scenario of > allocating 1 milion 4-byte objects with 8-byte alignment. > This was a bear to review, I feel like it could be split into a few smaller pieces. You are changing hinting, allocating/freeing, and how you find chunks. Those seem like good logical divisions to me. Overall the design seems sound and I didn't spot any major problems. Once you've split them up I'll do another thorough comb through and then add my reviewed by. Thanks, Josef -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>