Re: [PATCH 2/2] builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects

Jonathan Tan <jonathantanmy@xxxxxxxxxx> · Fri, 13 May 2022 16:06:39 -0700

Taylor Blau <me@xxxxxxxxxxxx> writes:
> An alternative approach to closing this race would have been to call
> `is_pack_valid()` on _all_ packs in a multi-pack bitmap on load. This
> has a couple of problems:
> 
>   - it is unnecessarily expensive in the cases where we don't actually
>     need to open any packs (e.g., in `git rev-list --use-bitmap-index
>     --count`)
> 
>   - more importantly, it means any time we would have hit this race,
>     we'll avoid using bitmaps altogether, leading to significant
>     slowdowns by forcing a full object traversal

This answers a question I had about why we're only opening the preferred
pack instead of all packs. (You mention in [1] that it's also answered
in that patch message, but I didn't see it.) In any case, it might be
clearer to move this part to the 1st commit.

[1] https://lore.kernel.org/git/Yn63nDhSBIEa3%2F+%2F@nand.local/

> Work around this by calling `is_pack_valid()` from within
> `want_found_object()`, matching the behavior in
> `want_object_in_pack_one()` (which has an analogous call). Most calls to
> `is_pack_valid()` should be basically no-ops, since only the first call
> requires us to open a file (subsequent calls realize the file is already
> open, and return immediately).
> 
> This does require us to make a small change in
> `want_object_in_pack_one()`, since `want_found_object()` may return `-1`
> (indicating "keep searching for other packs containing this object")
> when `*found_pack` is non-NULL. Force `want_object_in_pack_one()` to
> call `is_pack_valid()` when `p != *found_pack`, not just when
> `*found_pack` is non-NULL.

It took me a while to realize that the relevant want_found_object()
invocation that may return -1 is not the one in
want_object_in_pack_one(), but in the latter's caller
want_object_in_pack(). But even in this case, couldn't
want_found_object() return -1 (see the very end of the function) even
before this patch?

> @@ -1424,14 +1427,15 @@ static int want_object_in_pack_one(struct packed_git *p,
>  				   off_t *found_offset)
>  {
>  	off_t offset;
> +	int use_found = p == *found_pack;
>  
> -	if (p == *found_pack)
> +	if (use_found)
>  		offset = *found_offset;
>  	else
>  		offset = find_pack_entry_one(oid->hash, p);
>  
>  	if (offset) {
> -		if (!*found_pack) {
> +		if (!use_found) {
>  			if (!is_pack_valid(p))
>  				return -1;
>  			*found_offset = offset;

My understanding of the purpose of this code change is that if we reach
here with a non-NULL *found_pack, it is likely that *found_pack contains
an invalid pack, and this part overwrites *found_pack (and
*found_offset) if it finds a valid pack. This seems like a good change,
but I don't see how this is a result of something that "does require
us" (as far as I can tell, *found_pack could have already been invalid
before this patch, so the downstream code should already be able to
handle it). Maybe it's just that we couldn't tell if the pack is invalid
previously, but now we can; but if so, it would be better to say "use
this added information to overwrite *found_pack when it makes sense" or
something like that.

(An alternative to the change in this patch may be to reset *found_pack
to NULL when it is found that the pack is invalid, but I haven't
investigated all the callers to see if they can tolerate *found_pack
moving changing non-NULL to NULL, so the change in this patch is
probably more practical.)

Other than that (and the "close(fd)" thing in patch 1 that Junio
mentioned [2]), this series looks good to me. Thanks for noticing and
fixing the bug.

[2] https://lore.kernel.org/git/xmqqpmkh9tye.fsf@gitster.g/