Re: mempool update

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/10/2016 02:05 PM, Sage Weil wrote:
Hi Allen, everyone,

I spent the end of last week and today messing around some more with the
mempool framework.  My primary complaints were that the debug mode with
containers wasn't stable (because containers would come and go and race
with an enumeration of the list) and (more importantly) that the
per-container tracking wasn't actually what I wanted to see.  I wanted to
know "how much memory is consume in the entire pool by container
declaration x," and not how many items/bytes are in each individual
container instance.

This led me to restructure the way the containers are handled by defining
a template allocator *type* for the contained type (or
container-specific wrapper, like std::_Rb_tree_node) that had a static
member containing the actual allocator instance.  That mean that for
each type we declare to be in a container there has to be matching .cc
definition of the allocator structure (slightly more awkward) but it
totally eliminated the per-container-instance allocator state and made the
debug mode very light-weight (we enumerate the *types*, not instances).

The new version works something like so:

 struct obj {
   MEMBER_OF_MEMPOOL();
   int a;
   int b;
   obj() : a(1), b(1) {}
   obj(int _a) : a(_a), b(2) {}
   obj(int _a,int _b) : a(_a), b(_b) {}
   friend inline bool operator<(const obj& l, const obj& r) {
     return l.a < r.a;
   }
 };

 // and in a .cc file,
 MEMPOOL_DEFINE_OBJECT_FACTORY(obj, obj, unittest_2);

(That duplicate obj arg is just because scoped types like BlueStore::Foo
can't be used to name the _factory_* static instance variable, so the user
has to pass in a second variable-nameable name.  Not sure if there is some
preprocessor magic that can automate that or not...)

If I want to make containers, I can do this:

 unittest_1::vector<int> v;

 // and in a .cc file,
 MEMPOOL_DEFINE_FACTORY(int, int, unittest_1);

Vector works out of the box because it doesn't have a container-internal
wrapper, but for set and map and list you have to do

 unittest_1::set<int> s;
 unittest_1::map<int,obj> m;
 unittest_1::list<int> l;

 MEMPOOL_DEFINE_SET_FACTORY(int, int, unittest_1);
 MEMPOOL_DEFINE_MAP_FACTORY(int, obj, int_obj, unittest_1);
 MEMPOOL_DEFINE_LIST_FACTORY(int, int, unittest_1);

for the st::_List_node etc allocator instances.  Now, when you do debug
mode and dump the stats, you can see information like this (for
bluestore):

    "bluestore_meta_onode": {
        "total_bytes": 174760,
        "total_items": 257,
        "by_type": [
            {
                "type": "BlueStore::Onode",
                "bytes": 174760,
                "items": 257
            }
        ]
    },
    "bluestore_meta_other": {
        "total_bytes": 7969096,
        "total_items": 76899,
        "by_type": [
            {
                "type": "std::_Rb_tree_node<std::pair<unsigned long const, bluestore_extent_ref_m
ap_t::record_t> >",
                "bytes": 1146048,
                "items": 23876
            },
            {
                "type": "BlueStore::ExtentMap::Shard",
                "bytes": 111384,
                "items": 1989
            },
            {
                "type": "BlueStore::SharedBlob",
                "bytes": 2494848,
                "items": 12994
            },
            {
                "type": "std::_Rb_tree_node<std::pair<int const, boost::intrusive_ptr<BlueStore::
Blob> > >",
                "bytes": 55056,
                "items": 1147
            },
            {
                "type": "BlueStore::Blob",
                "bytes": 3014608,
                "items": 12994
            },
            {
                "type": "BlueStore::Extent",
                "bytes": 1147152,
                "items": 23899
            },
            {
                "type": "std::_Rb_tree_node<std::pair<unsigned long const, std::unique_ptr<BlueSt
ore::Buffer, std::default_delete<BlueStore::Buffer> > > >",
                "bytes": 0,
                "items": 0
            },
            {
                "type": "BlueStore::Buffer",
                "bytes": 0,
                "items": 0
            }
        ]
    }


Beautiful! Let me see if I can gather some more information with this verison.

which tells pretty much the full story about where the memory is going and
doesn't have the per-instance uglyness.  Notably, the container types in
the mempool namespace are just convenient names for std::map<>
etc--there's no longer a need to define the container types.

Here's the latest iteration:

	https://github.com/liewegas/ceph/blob/wip-mempool/src/include/mempool.h

Note that this version also removes all of the slab allocator logic--it's
just the accounting infrastructure.  Mark's most recent run shows that the
bluestore mempool is set at 1 GB and the RSS is almost 4 GB, so either
there are lots of allocations we're not capturing yet or the heap is
fragmenting heavily.  I suspect we'll want to add the slab behavior back
in soon so that we have memory that is nicely packed, but I'd prefer to
keep that change orthogonal so that we understand the full behavior!

FWIW, there has been some discrepancy in the massif heap results vs RSS too, though perhaps less of one than what we are seeing here. I would hazzard a guess that it's a bit of both.


Thoughts?
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux