Re: [PATCH] Make ALLOC_GROW to avoid computing alloc_nr twice.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 10, 2007 at 09:12:13PM +0000, Carlos Rica wrote:
> This makes the macro also more readable.
> 
> Signed-off-by: Carlos Rica <jasampler@xxxxxxxxx>
> ---
> 
>    I'm not sure if this could be more efficient,
>    since it replaces the computing of one more
>    alloc_nr expression with one assignment more.
>    It is much easier to read this way, I think.
> 
>  cache.h |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index 493983c..3e0842f 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -240,10 +240,9 @@ extern void verify_non_filename(const char *prefix, const char *name);
>  #define ALLOC_GROW(x, nr, alloc) \
>       do { \
>               if ((nr) > alloc) { \
> -                     if (alloc_nr(alloc) < (nr)) \
> +                     alloc = alloc_nr(alloc); \
> +                     if (alloc < (nr)) \
>                               alloc = (nr); \
> -                     else \
> -                             alloc = alloc_nr(alloc); \
>                       x = xrealloc((x), alloc * sizeof(*(x))); \
>               } \
>       } while(0)

  Well, I'm not very sure it matters a lot, because I would be surprised
that compilers like gcc aren't able to produce exactly the same code for
both. Have you checked if the assembly changes for any of those ?

  I've made some tests on my own, and if alloc is a "size_t" then gcc
generates exactly the same code. When alloc is a pointer to a size_t (I
mean you pass to ALLOC_GROW a third argument of the form *size_p which
happens in some places) the generated code is not the same, though for
the current code it reads:

  Current code                |  Your patch
------------------------------+-----------------------------
        cmpq    %rsi, %rdx    |         cmpq    %rsi, %rdx 
        ja      .L9           |         movq    %rsi, (%rcx) 
        movq    %rsi, (%rcx)  |         ja      .L8
.L6:                          | .L4:
        call    xrealloc      |         call    xrealloc


  This is the sole difference. Don't mind the labels numbers, as gcc
does not allocates them linearly (I mean there are holes, in fact there
is as many labels in each versions).

  So strictly speaking in that case, your patch is less good, because it
does one allocation it should not do.

  Also note that in the general case where alloc_nr (that is a macro
here) is a function, marking it with __attribute__((const)) will have
exactly the same effect. So I'd say that as far micro-performance
matter, your patch is worse :)





-- 
·O·  Pierre Habouzit
··O                                                madcoder@xxxxxxxxxx
OOO                                                http://www.madism.org

Attachment: pgptbRtPcXJtJ.pgp
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux