Re: [PATCH 2/2] refs/files: use heuristic to decide whether to repack with `--auto`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 03, 2024 at 11:23:12AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > I also noticed `log2i()`, but honestly the only reason why I didn't
> > reuse it is that I had no clue where to put it. There isn't any header
> > that would be a good fit for it, and creating a new "math.h" header for
> > a single function felt overblown to me. So I decided to just not bother.
> > I'm happy to adjust though if somebody has a suggestion for where to put
> > it.
> 
> Given the existing contents of wrapper.h (near the end of the file),
> I think wrapper.c would be a good place to do so.

Indeed. Quite a grab bag of functions, thanks for the hint!

> Shouldn't this essentially be a call to ffs() with the argument
> tweaked, by the way?

We are looking for the last set bit, not for the first set bit. We can
massage things a bit, but wouldn't that require us to be aware of the
platforms endianess?

In any case, GCC is clever enough to notice what we're doing:

    fastlog2(unsigned long):
            xor     eax, eax
            test    rdi, rdi
            je      .L5
            bsr     rax, rdi
    .L5:
            ret

The `log2i()` function looks a bit less efficient:

    log2i(unsigned long):
            test    rdi, rdi
            je      .L3
            bsr     rdi, rdi
            lea     eax, [rdi+1]
            cdqe
            ret
    .L3:
            xor     eax, eax
            ret

Clang isn't yet clever enough with v18.1, but is with trunk:

    log2i(unsigned long):
            bsr     rax, rdi
            inc     rax
            test    rdi, rdi
            cmove   rax, rdi
            ret

    fastlog2(unsigned long):
            test    rdi, rdi
            je      .LBB1_1
            shr     rdi
            bsr     rcx, rdi
            mov     eax, 127
            cmovne  rax, rcx
            xor     eax, -64
            add     eax, 65
            ret
    .LBB1_1:
            mov     eax, -1
            ret

So with the following definition we're optimizing both with GCC and
Clang:

    size_t fastlog2(size_t sz)
    {
        size_t l = 0;
        if (!sz)
            return 0;
        for (; sz; sz >>= 1)
            l++;
        return l;
    }

I'd thus say we can just pick that function instead of caring about
platform endianess with `ffs()`.

Patrick




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux