On Fri, Jul 15, 2022 at 8:16 AM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote: > > +/* > > + * Searches for a matching triplet. `va` is a pointer > > + * to the wanted commit position value. `vb` points to > > + * a triplet in lookup table. The first 4 bytes of each > > + * triplet (pointed by `vb`) are compared with `*va`. > > + */ > > +static int triplet_cmp(const void *va, const void *vb) > > +{ > > + > > + uint32_t a = *(uint32_t *)va; > > The comment you added is definitely helpful, but I still think that this > line is a little magical. `*va` isn't really a pointer to a `uint32_t`, > but a pointer to the start of a triplet, which just *happens* to have a > 4-byte integer at the beginning of it. Are you sure about this? As far as I know, the first parameter of such comparing functions is always a pointer to the given key that we need to search for and the second parameter points to each element of an array. I think "`va is a pointer to the wanted commit position value" is not that descriptive. Maybe "`va` is a pointer to the given key" is better. What do you think? > > + * `bsearch_triplet` function searches for the raw triplet having > > + * commit position same as `commit_pos` and fills `triplet` > > + * object from the raw triplet. Returns 1 on success and 0 > > + * on failure. > > + */ > > +static int bsearch_triplet(uint32_t *commit_pos, > > + struct bitmap_index *bitmap_git, > > + struct bitmap_lookup_table_triplet *triplet) > > +{ > > + unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count, > > + BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp); > > + > > + if (!p) > > + return 0; > > + triplet->commit_pos = get_be32(p); > > + p += sizeof(uint32_t); > > + triplet->offset = get_be64(p); > > + p += sizeof(uint64_t); > > + triplet->xor_row = get_be32(p); > > + return 1; > > +} > > This implementation jumped out as being quite similar to > `lookup_table_get_triplet()`. Ultimately they both end up filling a > triplet struct based on some position `p` within the bitmap. The main > difference being that in `lookup_table_get_triplet()`, `p` comes from a > numeric position which indexes into the table, while in > `bsearch_triplet()` the position `p` is given to us by a call to > `bsearch()`. > > I wonder if it would be worth extracting the common part of: given a > pointer `p` and a triplet struct, read the triplet beginning at `p` into > the struct. > > `lookup_table_get_triplet()` could compute `p` and then return the > result of calling the new auxiliary function with that `p`. Similarly > for `bsearch_triplet()`, it would call that auxiliary function with the > pointer it got from calling `bsearch()`, or return `0` if no match was > found. > > It's a minor point, but I think it would help us clean up the > implementation a little bit. Sure! That would be a great idea! > > + ALLOC_ARRAY(xor_items, xor_items_alloc); > > This ALLOC_ARRAY() looks great to me. I wonder if we could amortize the > cost of allocating in this (somewhat) hot function by treating the > `xor_items` array as a reusable static buffer where we reset > xor_items_nr to 0 when entering this function. > > > + while (xor_row != 0xffffffff) { > > + struct object_id xor_oid; > > + > > + if (xor_items_nr + 1 >= bitmap_git->entry_count) { > > + free(xor_items); > > + error(_("corrupt bitmap lookup table: xor chain exceed entry count")); > > I think we can probably `die()` here, we're pretty much out of luck in > this case. > ... > > + error(_("corrupt bitmap lookup table: commit index %u out of range"), > > + triplet.commit_pos); > > Same here. I didn't use `die()` here because I thought returning NULL would be a better idea. In that case, Git can still do its job by using the traditional approach - traversing between objects. `load_bitmap_entries_v1` also returns NULL if an error occurs. What do you think? > > + ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc); > > + xor_items[xor_items_nr++] = (struct bitmap_lookup_table_xor_item) {.oid = xor_oid, > > + .offset = offset_xor}; > > This style of initialization is somewhat uncommon for Git's codebase. It > might be a little more natural to write something like: > > xor_items[xor_items_nr].oid = xor_oid; > xor_items[xor_items_nr].offset = offset_xor; > xor_items_nr++; > > But the struct-copying for `xor_oid` is definitely uncommon for us. We > should use the `oidcpy()` helper there instead. Or better yet, pass a > pointer to `&xor_items[xor_items_nr].oid` as the second argument to > `nth_bitmap_object_oid()` to avoid the copy altogether. Ok, got it. > > + bitmap_git->map_pos = offset_xor; > > + if (bitmap_git->map_size - bitmap_git->map_pos < 6) { > > Should we extract `6` out to a named constant? Ok, sure! Thanks :) > > Thanks, > Taylor