Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Abhradeep and Taylor,

I very much enjoy following from a distance Abhradeep's work on this
series and all the reviewing and mentoring. I don't grasp anywhere near
all the details, but I've looked into this a bit:

On Sat, 16 Jul 2022 at 00:37, Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Fri, Jul 15, 2022 at 10:08:17PM +0530, Abhradeep Chakraborty wrote:
> > On Fri, Jul 15, 2022 at 8:16 AM Taylor Blau <me@xxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> > > > +/*
> > > > + * Searches for a matching triplet. `va` is a pointer
> > > > + * to the wanted commit position value. `vb` points to
> > > > + * a triplet in lookup table. The first 4 bytes of each
> > > > + * triplet (pointed by `vb`) are compared with `*va`.
> > > > + */
> > > > +static int triplet_cmp(const void *va, const void *vb)
> > > > +{
> > > > +
> > > > +     uint32_t a = *(uint32_t *)va;
> > >
> > > The comment you added is definitely helpful, but I still think that this
> > > line is a little magical. `*va` isn't really a pointer to a `uint32_t`,
> > > but a pointer to the start of a triplet, which just *happens* to have a
> > > 4-byte integer at the beginning of it.

Yeah, this all looks quite magical with the casting, and with the
asymmetric handling of `va` and `vb`.

> > Are you sure about this? As far as I know, the first parameter of such
> > comparing functions is always a pointer to the given key that we need
> > to search for and the second parameter points to each element of an
> > array.

Yes, that matches my understanding and the man-page for bsearch(3):

  "The compar routine is expected to have two arguments which point to
  the key object and to an array member, in that order, [...]"

I think it would help to make this something like

  static int triplet_cmp(const void *key, const void *array_item)

to really highlight this asymmetric nature of this function, or to make
clear how the values flow through our call-chain through something like

  static int triplet_cmp(const void *commit_pos, const void *table_entry)

Because we really do rely on this promise of bsearch(3) -- if we would
instantiate a 'dummy' triplet carrying the key, we wouldn't need to (but
we would instead need to have our `cmp` function constantly re-read the
same value, including doing the byteswap).

Would it make sense to let the `const void *key` directly carry the
32-bit value and hope that `sizeof(key) >= sizeof(uint32_t)`? That's
probably too magical, "just" to save on dereferencing.

One thing that could perhaps make things clearer is if
`bsearch_triplet()` did take the position directly, rather than as a
pointer:

-static int bsearch_triplet(uint32_t *commit_pos,
+static int bsearch_triplet(uint32_t commit_pos,
                           struct bitmap_index *bitmap_git,
                           struct bitmap_lookup_table_triplet *triplet)
 {
-       unsigned char *p = bsearch(commit_pos,
bitmap_git->table_lookup, bitmap_git->entry_count,
+       unsigned char *p = bsearch(&commit_pos,
bitmap_git->table_lookup, bitmap_git->entry_count,
                                   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH,
triplet_cmp);


Also, maybe s/bsearch_triplet/&_by_pos/ could clarify the intent of this
function?

> > I think "`va is a pointer to the wanted commit position value" is not
> > that descriptive. Maybe "`va` is a pointer to the given key" is
> > better. What do you think?
>
> Yes, the first argument to the comparison function used in bsearch() is

s/first/second/

> a pointer to some element in the array. I just meant that that array is
> the bitmap_git->table_lookup region, so each element isn't actually a
> uint32_t array, but the whole thing is an array of (uint32_t, uint64_t,
> uint32_t) triplets.
>
> What you wrote here is fine, and I don't even think that the comment
> needs updating. If you did want to clarify, I think you could say
> something along the lines of what you wrote above ("`va` is a pointer to
> an array element") and add something along the lines of "where the array
> is the lookup table region of the .bitmap".

I mentioned a few ideas for clarifying things above. I do think it would
be a good idea to differentiate the names of `va` and `vb` to make the
fundamental asymmetry between them clearer. The rest of my comments are
really just musings.

I originally started looking at this because I wanted to see why the
casting to a `uint32_t *` and dereferencing it was safe. The reason is,
we're always handling the same pointer to a `uint32_t` on the stack, so
alignment is guaranteed.


Martin



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux