RE: Ceph Hackathon: More Memory Allocator Testing

Dałek, Piotr <Piotr.Dalek@xxxxxxxxxxxxxx> · Thu, 20 Aug 2015 08:25:31 +0200

> -----Original Message-----
> From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Allen Samuels
> Sent: Wednesday, August 19, 2015 8:20 PM

> It was a surprising result that the memory allocator is making such a large
> difference in performance. All of the recent work in fiddling with TCmalloc's
> and Jemalloc's various knobs and switches has been excellent a great
> example of group collaboration. But I think it's only a partial optimization of
> the underlying problem. The real take-away from this activity is that the code
> base is doing a LOT of memory allocation/deallocation which is consuming
> substantial CPU time-- regardless of how much we optimize the memory
> allocator, you can't get away from the fact that it macroscopically MATTERs.
> The better long-term solution is to reduce reliance on the general-purpose
> memory allocator and to implement strategies that are more specific to our
> usage model.

That's what some are trying to do right now. See, for example, https://github.com/ceph/ceph/pull/5534 - one of first patches in this patchset increased Ceph performance on small I/O by around 3% - depending on kind of messenger used, it was either decreased CPU usage, increased bandwidth, or both. 

> What really needs to happen initially is to instrument the
> allocation/deallocation. Most likely we'll find that 80+% of the work is coming
> from just a few object classes and it will be easy to create custom allocation
> strategies for those usages.

I've done this in the past, most allocations and deallocations come from bufferlist code itself (for example, in ::rebuild() method). Other than that, it's scattered around entire Ceph code base. Even simple, small message objects (like heartbeats) are constantly allocated and freed. It's especially super-tricky, because buffers are constantly moved between threads, so I guess current Ceph code might constitute a worst-case scenario for memory managers.

> This will lead to even higher performance that's
> much less sensitive to easy-to-misconfigure environmental factors and the
> entire tcmalloc/jemalloc -- oops it uses more memory discussion will go
> away.

That memory issue probably won't go away, most high-performance memory allocators do their best to not return freed memory to the OS too soon, so even if application would free some memory, on the OS side it still would be seen as used.
On the bright side, in worst case scenario (physical RAM exhausted), swapping wouldn't be as big issue here, since OS tracks which memory pages are actually used and would move to swap pages that aren't used -- including allocated by memory allocator, but not used by actual application.

With best regards / Pozdrawiam
Piotr Dałek
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f