Re: [PATCH 03/16] pack-objects: use a faster hash table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 26, 2013 at 12:48 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> After this, the function returns.  The original did not add to the
> table the object name we are looking at, but the new code first adds
> it to the table with the unconditional kh_put_sha1() above.  Is a
> call to kh_del_sha1() missing here ...

No, this is not the case. That's the return case for when *the object
was found because it already existed in the hash table* (hence we
access it if we're excluding it, to tag it as excluded). We don't want
to remove it from the hash table because we're not the ones we
inserted it.

We only call `kh_del_sha1` in the cases where:

    1. The object wasn't found.
    2. We inserted its key on the hash table.
    3. We later learnt that we don't really want to pack this object.

>
>> @@ -921,38 +916,42 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
>>               return 0;
>>       }
>>
>> -     if (!exclude && local && has_loose_object_nonlocal(sha1))
>> +     if (!exclude && local && has_loose_object_nonlocal(sha1)) {
>> +             kh_del_sha1(packed_objects, ix);
>>               return 0;
>
> ... like this one, which seems to compensate for "ahh, after all we
> realize we do not want to add this one to the table"?
>
>> @@ -966,19 +965,30 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
>>               entry->in_pack_offset = found_offset;
>>       }
>>
>> -     if (object_ix_hashsz * 3 <= nr_objects * 4)
>> -             rehash_objects();
>> -     else
>> -             object_ix[-1 - ix] = nr_objects;
>> +     kh_value(packed_objects, ix) = entry;
>> +     kh_key(packed_objects, ix) = entry->idx.sha1;
>> +     objects[nr_objects++] = entry;
>>
>>       display_progress(progress_state, nr_objects);
>>
>> -     if (name && no_try_delta(name))
>> -             entry->no_try_delta = 1;
>> -
>>       return 1;
>>  }
>>
>> +static int add_object_entry(const unsigned char *sha1, enum object_type type,
>> +                         const char *name, int exclude)
>> +{
>> +     if (add_object_entry_1(sha1, type, name_hash(name), exclude, NULL, 0)) {
>> +             struct object_entry *entry = objects[nr_objects - 1];
>> +
>> +             if (name && no_try_delta(name))
>> +                     entry->no_try_delta = 1;
>> +
>> +             return 1;
>> +     }
>> +
>> +     return 0;
>> +}
>
> It is somewhat unclear what we are getting from the split of the
> main part of this function into *_1(), other than the *_1() function
> now has a very deep indentation inside "if (!found_pack)", which is
> always true because the caller always passes NULL to found_pack.
> Perhaps this is an unrelated refactoring that is needed for later
> steps and does not have anything to do with the use of new hash
> function?

Yes, apologies for not making this clear. By refactoring into `_1`,
you can see how `traverse_bitmap_commit_list` can use the `_1` version
directly as a callback, to insert objects straight into the packing
list without looking them up. This is very efficient because we can
pass the whole API straight from the bitmap code:

1. The SHA1: we find it by simply looking up the `nth` sha1 on the
pack index (if we are yielding bit `n`)
2. The object type: we find it because we have type indexes that let
us know the type of any given bit in the bitmap by and-ing it with the
index.
3. The hash for its name: we can look it up from the name hash cache
in the new bitmap format.
4. Exclude flag: we never exclude when working with bitmaps
5. found_pack: all the bitmapped objects come from the same pack!
6. found_offset: we find it by simply looking up the `nth` offset on
the pack index (if we are yielding bit `n`)

Boom! We filled the callback just from the data in a bitmap. Ain't that nice?

Let me amend the commit message.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]