From: Jesper Dangaard Brouer > The network stack have some use-cases that puts some extreme demands > on the memory allocator. One use-case, 10Gbit/s wirespeed at smallest > packet size[1], requires handling a packet every 67.2 ns (nanosec). > > Micro benchmarking[2] the SLUB allocator (with skb size 256bytes > elements), show "fast-path" instant reuse only costs 19 ns, but a > closer to network usage pattern show the cost rise to 45 ns. > > This patchset introduce a quick mempool (qmempool), which when used > in-front of the SKB (sk_buff) kmem_cache, saves 12 ns on "fast-path" > drop in iptables "raw" table, but more importantly saves 40 ns with > IP-forwarding, which were hitting the slower SLUB use-case. > > > One of the building blocks for achieving this speedup is a cmpxchg > based Lock-Free queue that supports bulking, named alf_queue for > Array-based Lock-Free queue. By bulking elements (pointers) from the > queue, the cost of the cmpxchg (approx 8 ns) is amortized over several > elements. It seems to me that these improvements could be added to the underlying allocator itself. Nesting allocators doesn't really seem right to me. David ��.n������g����a����&ޖ)���)��h���&������梷�����Ǟ�m������)������^�����������v���O��zf������