Re: [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 04, 2022 at 08:46:14AM +0000, Abhradeep Chakraborty via GitGitGadget wrote:
> +/*
> + * Searches for a matching triplet. `va` is a pointer
> + * to the wanted commit position value. `vb` points to
> + * a triplet in lookup table. The first 4 bytes of each
> + * triplet (pointed by `vb`) are compared with `*va`.
> + */
> +static int triplet_cmp(const void *va, const void *vb)
> +{
> +
> +	uint32_t a = *(uint32_t *)va;

The comment you added is definitely helpful, but I still think that this
line is a little magical. `*va` isn't really a pointer to a `uint32_t`,
but a pointer to the start of a triplet, which just *happens* to have a
4-byte integer at the beginning of it.

I don't think there's a way to improve this much more than we already
have, though. Populating a triplet struct to just dereference the first
field feels wasteful and slow. So I think what you have here makes sense
to me.

> +static uint32_t bsearch_pos(struct bitmap_index *bitmap_git,
> +			    struct object_id *oid,
> +			    uint32_t *result)
> +{
> +	int found;
> +
> +	if (bitmap_is_midx(bitmap_git))
> +		found = bsearch_midx(oid, bitmap_git->midx, result);
> +	else
> +		found = bsearch_pack(oid, bitmap_git->pack, result);
> +
> +	return found;
> +}
> +
> +/*
> + * `bsearch_triplet` function searches for the raw triplet having
> + * commit position same as `commit_pos` and fills `triplet`
> + * object from the raw triplet. Returns 1 on success and 0
> + * on failure.
> + */
> +static int bsearch_triplet(uint32_t *commit_pos,
> +			   struct bitmap_index *bitmap_git,
> +			   struct bitmap_lookup_table_triplet *triplet)
> +{
> +	unsigned char *p = bsearch(commit_pos, bitmap_git->table_lookup, bitmap_git->entry_count,
> +				   BITMAP_LOOKUP_TABLE_TRIPLET_WIDTH, triplet_cmp);
> +
> +	if (!p)
> +		return 0;
> +	triplet->commit_pos = get_be32(p);
> +	p += sizeof(uint32_t);
> +	triplet->offset = get_be64(p);
> +	p += sizeof(uint64_t);
> +	triplet->xor_row = get_be32(p);
> +	return 1;
> +}

This implementation jumped out as being quite similar to
`lookup_table_get_triplet()`. Ultimately they both end up filling a
triplet struct based on some position `p` within the bitmap. The main
difference being that in `lookup_table_get_triplet()`, `p` comes from a
numeric position which indexes into the table, while in
`bsearch_triplet()` the position `p` is given to us by a call to
`bsearch()`.

I wonder if it would be worth extracting the common part of: given a
pointer `p` and a triplet struct, read the triplet beginning at `p` into
the struct.

`lookup_table_get_triplet()` could compute `p` and then return the
result of calling the new auxiliary function with that `p`. Similarly
for `bsearch_triplet()`, it would call that auxiliary function with the
pointer it got from calling `bsearch()`, or return `0` if no match was
found.

It's a minor point, but I think it would help us clean up the
implementation a little bit.

> +
> +static struct stored_bitmap *lazy_bitmap_for_commit(struct bitmap_index *bitmap_git,
> +					  struct commit *commit)
> +{
> +	uint32_t commit_pos, xor_row;
> +	uint64_t offset;
> +	int flags;
> +	struct bitmap_lookup_table_triplet triplet;
> +	struct object_id *oid = &commit->object.oid;
> +	struct ewah_bitmap *bitmap;
> +	struct stored_bitmap *xor_bitmap = NULL;
> +
> +	int found = bsearch_pos(bitmap_git, oid, &commit_pos);
> +
> +	if (!found)
> +		return NULL;
> +
> +	if (!bsearch_triplet(&commit_pos, bitmap_git, &triplet))
> +		return NULL;
> +
> +	offset = triplet.offset;
> +	xor_row = triplet.xor_row;
> +
> +	if (xor_row != 0xffffffff) {
> +		int xor_flags;
> +		khiter_t hash_pos;
> +		uint64_t offset_xor;
> +		struct bitmap_lookup_table_xor_item *xor_items;
> +		struct bitmap_lookup_table_xor_item xor_item;
> +		size_t xor_items_nr = 0, xor_items_alloc = 64;
> +
> +		ALLOC_ARRAY(xor_items, xor_items_alloc);

This ALLOC_ARRAY() looks great to me. I wonder if we could amortize the
cost of allocating in this (somewhat) hot function by treating the
`xor_items` array as a reusable static buffer where we reset
xor_items_nr to 0 when entering this function.

> +		while (xor_row != 0xffffffff) {
> +			struct object_id xor_oid;
> +
> +			if (xor_items_nr + 1 >= bitmap_git->entry_count) {
> +				free(xor_items);
> +				error(_("corrupt bitmap lookup table: xor chain exceed entry count"));

I think we can probably `die()` here, we're pretty much out of luck in
this case.

> +				return NULL;
> +			}
> +
> +			if (lookup_table_get_triplet(bitmap_git, xor_row, &triplet) < 0)
> +				return NULL;
> +
> +			offset_xor = triplet.offset;
> +
> +			if (nth_bitmap_object_oid(bitmap_git, &xor_oid, triplet.commit_pos) < 0) {
> +				free(xor_items);
> +				error(_("corrupt bitmap lookup table: commit index %u out of range"),
> +					triplet.commit_pos);

Same here.

> +				return NULL;
> +			}
> +
> +			hash_pos = kh_get_oid_map(bitmap_git->bitmaps, xor_oid);
> +
> +			/*
> +			 * If desired bitmap is already stored, we don't need
> +			 * to iterate further. Because we know that bitmaps
> +			 * that are needed to be parsed to parse this bitmap
> +			 * has already been stored. So, assign this stored bitmap
> +			 * to the xor_bitmap.
> +			 */
> +			if (hash_pos < kh_end(bitmap_git->bitmaps) &&
> +			    (xor_bitmap = kh_value(bitmap_git->bitmaps, hash_pos)))
> +				break;
> +
> +			ALLOC_GROW(xor_items, xor_items_nr + 1, xor_items_alloc);
> +			xor_items[xor_items_nr++] = (struct bitmap_lookup_table_xor_item) {.oid = xor_oid,
> +											   .offset = offset_xor};

This style of initialization is somewhat uncommon for Git's codebase. It
might be a little more natural to write something like:

    xor_items[xor_items_nr].oid = xor_oid;
    xor_items[xor_items_nr].offset = offset_xor;
    xor_items_nr++;

But the struct-copying for `xor_oid` is definitely uncommon for us. We
should use the `oidcpy()` helper there instead. Or better yet, pass a
pointer to `&xor_items[xor_items_nr].oid` as the second argument to
`nth_bitmap_object_oid()` to avoid the copy altogether.

> +			xor_row = triplet.xor_row;
> +		}
> +
> +		while (xor_items_nr) {
> +			xor_item = xor_items[xor_items_nr - 1];
> +			offset_xor = xor_item.offset;
> +
> +			bitmap_git->map_pos = offset_xor;
> +			if (bitmap_git->map_size - bitmap_git->map_pos < 6) {

Should we extract `6` out to a named constant?

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux