RE: [PATCH 00/13] object store: alloc

Jameson Miller <jamill@xxxxxxxxxxxxx> · Wed, 2 May 2018 18:44:53 +0000

> -----Original Message-----
> From: Duy Nguyen <pclouds@xxxxxxxxx>
> Sent: Wednesday, May 2, 2018 2:23 PM
> To: Jameson Miller <jamill@xxxxxxxxxxxxx>
> Cc: Stefan Beller <sbeller@xxxxxxxxxx>; Git Mailing List <git@xxxxxxxxxxxxxxx>
> Subject: Re: [PATCH 00/13] object store: alloc
> 
> On Wed, May 2, 2018 at 8:07 PM, Jameson Miller <jamill@xxxxxxxxxxxxx>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Duy Nguyen <pclouds@xxxxxxxxx>
> >> Sent: Wednesday, May 2, 2018 1:02 PM
> >> To: Stefan Beller <sbeller@xxxxxxxxxx>
> >> Cc: Git Mailing List <git@xxxxxxxxxxxxxxx>; Jameson Miller
> >> <jamill@xxxxxxxxxxxxx>
> >> Subject: Re: [PATCH 00/13] object store: alloc
> >>
> >> On Tue, May 1, 2018 at 11:33 PM, Stefan Beller <sbeller@xxxxxxxxxx>
> wrote:
> >> > I also debated if it is worth converting alloc.c via this patch
> >> > series or if it might make more sense to use the new mem-pool by
> Jameson[1].
> >> >
> >> > I vaguely wonder about the performance impact, as the object
> >> > allocation code seemed to be relevant in the past.
> >>
> >> If I remember correctly, alloc.c was added because malloc() has too
> >> high overhead per allocation (and we create like millions of them).
> >> As long as you keep allocation overhead low, it should be ok. Note
> >> that we allocate a lot more than the mem-pool's main target (cache
> >> entries if I remember correctly). We may have a couple thousands
> >> cache entries.  We already deal with a couple million of struct object.
> >
> > The work to move cache entry allocation onto a memory pool was
> > motivated by the fact that we are looking at indexes with millions of
> > entries. If there is scaling concern with the current version of
> > mem-pool, we would like to address it there as well. Or if there is
> improvements that can be shared, that would be nice as well.
> 
> I think the two have quite different characteristics. alloc.c code is driven by
> overhead. struct blob is only 24 bytes each and about 1/3 the repo is blobs, and
> each malloc has 16 bytes overhead or so if I remember correctly. struct
> cache_entry at minimum in 88 bytes so relative overhead is not that a big deal
> (but sure reducing it is still very nice).
> 
> mem-pool is about allocation speed, but I think that's not a concern for alloc.c
> because when we do full rev walk, I think I/O is always the bottleneck (maybe
> object lookup as well). I don't see a good way to have the one memory allocator
> that satisfyies both to be honest. If you could allocate fixed-size cache entries
> most of the time (e.g.
> larger entries will be allocated using malloc or something, and should be a small
> number), then perhaps we can unify. Or if mem-pool can have an option to
> allocated fixed size pieces with no overhead, that would be great (sorry I still
> have not read that mem-pool series yet)
> --
> Duy

Thank you for the extra details - the extra context was helpful -
especially the motivations for each of the areas. I agree with
your general analysis, but with the extra point that the memory
pool does allocate memory (variable sized) without any overhead,
except for possible alignment considerations and differences in
bookkeeping the larger "blocks" of memory from which small
allocations are made from - but I don't think this would be
enough to have a meaningful overall impact.

The mem-pool only tracks the pointer to the next available bit of
memory, and the end of the available memory range. It has a
similar constraint in that individual allocations cannot be freed
- you have to free the whole block.

It may be that the requirements are different enough (or the
gains worth it) to have another dedicated pooling allocator, but
I think the current design of the memory pool would satisfy both
consumers, even if the memory considerations are a bigger
motivation for blob structs. I would be interested in your
thoughts if you get the opportunity to read the mem-pool series.