Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 15, 2022 at 8:16 AM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> > +/*
> > + * Searches for a matching triplet. `va` is a pointer
> > + * to the wanted commit position value. `vb` points to
> > + * a triplet in lookup table. The first 4 bytes of each
> > + * triplet (pointed by `vb`) are compared with `*va`.
> > + */
> > +static int triplet_cmp(const void *va, const void *vb)
> > +{
> > +
> > +     uint32_t a = *(uint32_t *)va;
>
> The comment you added is definitely helpful, but I still think that this
> line is a little magical. `*va` isn't really a pointer to a `uint32_t`,
> but a pointer to the start of a triplet, which just *happens* to have a
> 4-byte integer at the beginning of it.

Are you sure about this? As far as I know, the first parameter of such
comparing functions is always a pointer to the given key that we need
to search for and the second parameter points to each element of an
array.

I think "`va is a pointer to the wanted commit position value" is not
that descriptive. Maybe "`va` is a pointer to the given key" is
better. What do you think?

> > + * `bsearch_triplet` function searches for the raw triplet having
> > + * commit position same as `commit_pos` and fills `triplet`
> > + * object from the raw triplet. Returns 1 on success and 0
> > + * on failure.
> > + */
> > +static int bsearch_triplet(uint32_t *commit_pos,
> > +                        struct bitmap_index *bitmap_git,
> > +                        struct bitmap_lookup_table_triplet *triplet)
> > +{
> > +     unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
> > +                                BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
> > +
> > +     if (!p)
> > +             return 0;
> > +     triplet->commit_pos = get_be32(p);
> > +     p += sizeof(uint32_t);
> > +     triplet->offset = get_be64(p);
> > +     p += sizeof(uint64_t);
> > +     triplet->xor_row = get_be32(p);
> > +     return 1;
> > +}
>
> This implementation jumped out as being quite similar to
> `lookup_table_get_triplet()`. Ultimately they both end up filling a
> triplet struct based on some position `p` within the bitmap. The main
> difference being that in `lookup_table_get_triplet()`, `p` comes from a
> numeric position which indexes into the table, while in
> `bsearch_triplet()` the position `p` is given to us by a call to
> `bsearch()`.
>
> I wonder if it would be worth extracting the common part of: given a
> pointer `p` and a triplet struct, read the triplet beginning at `p` into
> the struct.
>
> `lookup_table_get_triplet()` could compute `p` and then return the
> result of calling the new auxiliary function with that `p`. Similarly
> for `bsearch_triplet()`, it would call that auxiliary function with the
> pointer it got from calling `bsearch()`, or return `0` if no match was
> found.
>
> It's a minor point, but I think it would help us clean up the
> implementation a little bit.

Sure! That would be a great idea!

> > +             ALLOC_ARRAY(xor_items, xor_items_alloc);
>
> This ALLOC_ARRAY() looks great to me. I wonder if we could amortize the
> cost of allocating in this (somewhat) hot function by treating the
> `xor_items` array as a reusable static buffer where we reset
> xor_items_nr to 0 when entering this function.
>
> > +             while (xor_row != 0xffffffff) {
> > +                     struct object_id xor_oid;
> > +
> > +                     if (xor_items_nr + 1 >= bitmap_git->entry_count) {
> > +                             free(xor_items);
> > +                             error(_("corrupt bitmap lookup table: xor chain exceed entry count"));
>
> I think we can probably `die()` here, we're pretty much out of luck in
> this case.
> ...
> > +                             error(_("corrupt bitmap lookup table: commit index %u out of range"),
> > +                                     triplet.commit_pos);
>
> Same here.

I didn't use `die()` here because I thought returning NULL would be a
better idea. In that case, Git can still do its job by using the
traditional approach  - traversing  between objects.
`load_bitmap_entries_v1` also returns NULL if an error occurs. What do
you think?

> > +                     ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
> > +                     xor_items[xor_items_nr++] = (struct bitmap_lookup_table_xor_item) {.oid = xor_oid,
> > +                                                                                        .offset = offset_xor};
>
> This style of initialization is somewhat uncommon for Git's codebase. It
> might be a little more natural to write something like:
>
>     xor_items[xor_items_nr].oid = xor_oid;
>     xor_items[xor_items_nr].offset = offset_xor;
>     xor_items_nr++;
>
> But the struct-copying for `xor_oid` is definitely uncommon for us. We
> should use the `oidcpy()` helper there instead. Or better yet, pass a
> pointer to `&xor_items[xor_items_nr].oid` as the second argument to
> `nth_bitmap_object_oid()` to avoid the copy altogether.

Ok, got it.

> > +                     bitmap_git->map_pos = offset_xor;
> > +                     if (bitmap_git->map_size - bitmap_git->map_pos < 6) {
>
> Should we extract `6` out to a named constant?

Ok, sure!

Thanks :)

>
> Thanks,
> Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux