Re: [PATCH] tree-diff: avoid alloca for large allocations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 07, 2016 at 05:36:59PM -0700, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > An alternative to this would be implement something like:
> >
> >   struct tree *tp, tp_fallback[2];
> >   if (nparent <= ARRAY_SIZE(tp_fallback))
> >           tp = tp_fallback;
> >   else
> > 	  ALLOC_ARRAY(tp, nparent);
> >   ...
> >   if (tp != tp_fallback)
> > 	  free(tp);
> >
> > That would let us drop our xalloca() portability code
> > entirely. But in my measurements, this seemed to perform
> > slightly worse than the xalloca solution.
> 
> It indeed is curious why this "obvious" alternative performed
> worse.

Yeah. I'd be happy if somebody wanted to prove me wrong here. It's
entirely possible I just did something stupid. I had wrapped up the
above in the FAST_ARRAY_ALLOC macro. It would waste the stack space when
we _did_ have to call malloc, but that should almost never happen in my
benchmark. It also allocates an extra slot on the stack for non-merge
commits. So I guess the wasted 24 bytes or whatever could have in
impact. It's also possible that the extra variable simply tickles the
optimizer in some way; I didn't look at the generated asm.

FWIW, here are the timings I had come up with for running "git log --raw
--no-abbrev --no-renames" on linux.git (the same benchmark from the
commit that originally added the alloca). The "time" output at the end
is best-of-five, with the "attempt" lines showing the wall-clock time
for each:

  [original, with xalloca]
  Attempt 1: 30.669
  Attempt 2: 30.782
  Attempt 3: 31.807
  Attempt 4: 31.152
  Attempt 5: 30.243

  real    0m30.243s
  user    0m30.112s
  sys     0m0.128s


  [xmalloc/free instead of xalloca]
  Attempt 1: 31.306
  Attempt 2: 31.814
  Attempt 3: 31.138
  Attempt 4: 31.787
  Attempt 5: 32.23

  real    0m31.138s
  user    0m30.976s
  sys     0m0.160s


  [local var with fallback to xmalloc]
  Attempt 1: 30.926
  Attempt 2: 31.466
  Attempt 3: 31.865
  Attempt 4: 31.345
  Attempt 5: 33.159

  real    0m30.926s
  user    0m30.804s
  sys     0m0.120s


So the improvement here over even a naive malloc/free is _really_ small.
You'll note that the best-of-five for xmalloc is actually smaller than
the worst-case with xalloca. But the mean is still 2% better. The mean
for the "fallback" case aren't really any better (which makes me wonder
if I was somehow accidentally calling malloc each time).

So I dunno. For 2%, I was tempted to simply say "screw it, let's just
forget this micro-optimization and call xmalloc". This patch is the
conservative choice in that it has the same performance profile as the
original. Or so I thought. I didn't record the old timings, but they
looked similar to the original. I just recreated them while writing this
email and got:

  [xalloca with fallback to xmalloc]
  Attempt 1: 31.356
  Attempt 2: 31.725
  Attempt 3: 31.454
  Attempt 4: 30.898
  Attempt 5: 31.937

  real    0m30.898s
  user    0m30.752s
  sys     0m0.144s

which really isn't much better than the local-var case. I wonder if just
having the conditional stack/heap is what kills us (either itself, or
because it affects the optimizer).

I dunno.


> > +#define FAST_ARRAY_ALLOC(x, nr) do { \
> > +	if ((nr) <= 2) \
> > +		(x) = xalloca((nr) * sizeof(*(x))); \
> > +	else \
> > +		ALLOC_ARRAY((x), nr); \
> > +} while(0)
> > +#define FAST_ARRAY_FREE(x, nr) do { \
> > +	if ((nr) > 2) \
> > +		free((x)); \
> > +} while(0)
> 
> An obvious and clean implementation of the idea.
> 
> The only slight worry I have is that nr must stay constant until the
> time the caller calls FREE(), but for the only three callsite pairs
> it is clear nparent will stay constant and this is local to the
> file.

Yep, I had the same worry. I think it's OK because the damage is limited
to tree-diff.c. I'm not sure I would promote this macro for general use.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]