Hi Abhradeep and Taylor, I very much enjoy following from a distance Abhradeep's work on this series and all the reviewing and mentoring. I don't grasp anywhere near all the details, but I've looked into this a bit: On Sat, 16 Jul 2022 at 00:37, Taylor Blau <me@xxxxxxxxxxxx> wrote: > > On Fri, Jul 15, 2022 at 10:08:17PM +0530, Abhradeep Chakraborty wrote: > > On Fri, Jul 15, 2022 at 8:16 AM Taylor Blau <me@xxxxxxxxxxxx> wrote: > > > > > > On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote: > > > > +/* > > > > + * Searches for a matching triplet. `va` is a pointer > > > > + * to the wanted commit position value. `vb` points to > > > > + * a triplet in lookup table. The first 4 bytes of each > > > > + * triplet (pointed by `vb`) are compared with `*va`. > > > > + */ > > > > +static int triplet_cmp(const void *va, const void *vb) > > > > +{ > > > > + > > > > + uint32_t a = *(uint32_t *)va; > > > > > > The comment you added is definitely helpful, but I still think that this > > > line is a little magical. `*va` isn't really a pointer to a `uint32_t`, > > > but a pointer to the start of a triplet, which just *happens* to have a > > > 4-byte integer at the beginning of it. Yeah, this all looks quite magical with the casting, and with the asymmetric handling of `va` and `vb`. > > Are you sure about this? As far as I know, the first parameter of such > > comparing functions is always a pointer to the given key that we need > > to search for and the second parameter points to each element of an > > array. Yes, that matches my understanding and the man-page for bsearch(3): "The compar routine is expected to have two arguments which point to the key object and to an array member, in that order, [...]" I think it would help to make this something like static int triplet_cmp(const void *key, const void *array_item) to really highlight this asymmetric nature of this function, or to make clear how the values flow through our call-chain through something like static int triplet_cmp(const void *commit_pos, const void *table_entry) Because we really do rely on this promise of bsearch(3) -- if we would instantiate a 'dummy' triplet carrying the key, we wouldn't need to (but we would instead need to have our `cmp` function constantly re-read the same value, including doing the byteswap). Would it make sense to let the `const void *key` directly carry the 32-bit value and hope that `sizeof(key) >= sizeof(uint32_t)`? That's probably too magical, "just" to save on dereferencing. One thing that could perhaps make things clearer is if `bsearch_triplet()` did take the position directly, rather than as a pointer: -static int bsearch_triplet(uint32_t *commit_pos, +static int bsearch_triplet(uint32_t commit_pos, struct bitmap_index *bitmap_git, struct bitmap_lookup_table_triplet *triplet) { - unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count, + unsigned char *p = bsearch(&commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count, BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp); Also, maybe s/bsearch_triplet/&_by_pos/ could clarify the intent of this function? > > I think "`va is a pointer to the wanted commit position value" is not > > that descriptive. Maybe "`va` is a pointer to the given key" is > > better. What do you think? > > Yes, the first argument to the comparison function used in bsearch() is s/first/second/ > a pointer to some element in the array. I just meant that that array is > the bitmap_git->table_lookup region, so each element isn't actually a > uint32_t array, but the whole thing is an array of (uint32_t, uint64_t, > uint32_t) triplets. > > What you wrote here is fine, and I don't even think that the comment > needs updating. If you did want to clarify, I think you could say > something along the lines of what you wrote above ("`va` is a pointer to > an array element") and add something along the lines of "where the array > is the lookup table region of the .bitmap". I mentioned a few ideas for clarifying things above. I do think it would be a good idea to differentiate the names of `va` and `vb` to make the fundamental asymmetry between them clearer. The rest of my comments are really just musings. I originally started looking at this because I wanted to see why the casting to a `uint32_t *` and dereferencing it was safe. The reason is, we're always handling the same pointer to a `uint32_t` on the stack, so alignment is guaranteed. Martin