Re: [PATCH 4/4] pack-revindex: radix-sort the revindex

Jeff King <peff@xxxxxxxx> · Wed, 10 Jul 2013 06:52:44 -0400

On Mon, Jul 08, 2013 at 01:50:41PM -0700, Brandon Casey wrote:

> > +static void sort_revindex(struct revindex_entry *entries, int n, off_t max)
> 
> If 'n' is the number of objects in the pack, shouldn't it be unsigned?

Yes. I inherited that bug from the rest of the revindex code.

> The data type for struct packed_git.num_objects is uint32_t.  Looks
> like create_pack_revindex uses the wrong datatype when it captures
> num_objects in the int num_ent and passes it to sort_revindex.  So, it
> looks like that function needs to be updated too.

Yep. And the binary search in find_pack_revindex, too (which even has an
integer overflow!). I'll add a patch to my series to fix it (and switch
mine).

> > +       while (max / (((off_t)1) << digits)) {
> 
> Is there any reason this shouldn't be simplified to just:
> 
>        while (max >> digits) {

No, yours is much more readable. In case you are wondering how I ended
up with that monstrosity, I originally did not keep the "digits" field
as a number of bits, but rather a running total that bit-shifted 16 bits
each time through the loop. I'll change it in the re-roll.

> I glanced briefly at the assembly and it appears that gcc does
> actually emit a divide instruction to accomplish this, which I think
> we can avoid by just rearranging the operation.

Yep, although it almost certainly doesn't matter. We hit that loop
condition check at most 5 times for a 64-bit integer. I'm more concerned
with readability.

> > +       if (a != entries) {
> > +               int i;
> > +               for (i = 0; i < n; i++)
> > +                       entries[i] = tmp[i];
> 
> I think I recall that somebody investigated whether a for loop like
> you have above was faster for copying structures than memcpy.  I
> forget whether it was conclusive.  Did you happen to compare them?

It was me, actually, but the comparison was for memcmp rather than an
open-coded loop. And the conclusion was that memcmp is way faster on
glibc 2.13 and higher.

I think memcpy probably is going to be faster (especially in recent
versions of glibc), given the size of the array (the other memcmp
discussion was for 20-byte hashes, where the function call and setup
time was much more relevant).

But I don't think this was even timed at all in my tests. Since we go
back-and-forth between the original array and the tmp storage, we have a
"50%" chance of not needing to swap back anyway. So for packfiles up to
64K, we do the swap (but they are not that interesting to measure), and
then from 64K to 4G, we do not.

Note that we also use struct assignment in the sort itself to drop
elements into their buckets. That could potentially use memcpy, though I
would expect the compiler to generate pretty decent instructions for
such a small struct.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html