RE: [PATCH v2 0/5] Allocate cache entries from memory pool

Jameson Miller <jamill@xxxxxxxxxxxxx> · Thu, 3 May 2018 21:13:16 +0000

> -----Original Message-----
> From: git-owner@xxxxxxxxxxxxxxx <git-owner@xxxxxxxxxxxxxxx> On Behalf Of
> Stefan Beller
> Sent: Thursday, May 3, 2018 4:59 PM
> To: Duy Nguyen <pclouds@xxxxxxxxx>
> Cc: Jameson Miller <jamill@xxxxxxxxxxxxx>; git@xxxxxxxxxxxxxxx;
> gitster@xxxxxxxxx; jonathantanmy@xxxxxxxxxx
> Subject: Re: [PATCH v2 0/5] Allocate cache entries from memory pool
> 
> On Thu, May 3, 2018 at 12:17 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
> >
> >> To me it is also a clear yes when it comes to combining these two
> >> memory pools.
> >
> > I also did not notice that jm/mem-pool already landed in master.
> 
> Oh, thanks for telling! Now that I look at it, I am doubting it;
> 
> The reason for my doubt is the potential quadratic behavior for new allocations,
> in mem_pool_alloc() we walk all mp_blocks to see if we can fit the requested
> allocation in one of the later blocks.
> So if we call mem_pool_alloc a million times, we get a O(n) mp_blocks which
> we'd have to walk in each call.

With the current design, when a new mp_block is allocated, it is
placed at the head of the linked list. This means that the most
recently allocated mp_block is the 1st block that is
searched. The *vast* majority of allocations should be fulfilled
from this 1st block. It is only when the block is full that we
search other mp_blocks in the list. If this is a concern, I think
we have a couple low cost options to mitigate it (maybe a flag to
control whether we search past the 1st mp_block for space, or
logic to move blocks out of the search queue when they are
full or fall below a threshold for available space).

If this is of interest, I could contribute a patch to enable one
of these behaviors?

> 
> However in alloc.c we do know that a slab is full as soon as we look take the
> next slab. That is the beauty of knowing 'len' at construction time of the
> allocator.
> 
> So I guess I'll just re-use the mp_block and introduce another struct
> fixed_sized_mem_pool, which will not look into other mp_blocks but the
> current.
> 
> 
> > Have
> > you tried measure (both memory usage and allocation speed) of it and
> > alloc.c?
> 
> No, I was about to, but then started reading the code in an attempt to replace
> alloc.c by a mempool and saw the quadratic behavior.
> 
> > Just take some big repo as an example and do count-objects -v to see
> > how many blobs/trees/commits it has, then allocate the same amount
> > with both alloc.c and mem-pool.c and measure both speed/mem.
> > I'm pretty sure you're right that mem-pool.c is a clear yes. I was
> > just being more conservative because we do (slightly) change
> > allocator's behavior when we make the switch. But it's also very
> > likely that any performance difference will be insignificant.
> >
> > I'm asking this because if mem-pool.c is a clear winner, you can start
> > to update you series to use it now and kill alloc.c in the process.
> 
> I'll implement the fixed_sized_mem_pool and take some measurements.
> 
> >
> > PS. Is Jeff back yet?
> 
> His last email on the public list is Apr 10th, stating that he'll be offline for "a few
> weeks", in <20180406175349.GB32228@xxxxxxxxxxxxxxxxxxxxx> he said the
> vacation part is 3 weeks. So I think he is done with vacation and is just hiding to
> figure out a nice comeback. ;-)
> 
> > I'm sure Junio is listening and all but I'm afraid he's too busy being
> > a maintainer so Jeff's opinion in this area is really valuable. He has
> > all the fun and weird use cases to play with at github.
> 
> ok. I'll cc him for these patches.
> 
> Thanks,
> Stefan