Re: [PATCH v2 5/8] packfile: pass down repository to `has_object[_kept]_pack`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 28, 2024 at 02:43:43PM +0100, Karthik Nayak wrote:
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 0800714267..2b2816c243 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1529,7 +1529,7 @@ static int want_found_object(const struct object_id *oid, int exclude,
>  			return 0;
>  		if (ignore_packed_keep_in_core && p->pack_keep_in_core)
>  			return 0;
> -		if (has_object_kept_pack(oid, flags))
> +		if (has_object_kept_pack(the_repository, oid, flags))

Do we want to use p->repo here instead of the_repository? I think the
answer is "yes" since in this function we are given a pack "p" and want
to determine if the given object contained in "p" is useful to pack. If
not, we want to search for it among other packs here, likely within the
same repository.

(Again, probably a moot point here since this is all going to be
the_repository anyway, but just thinking aloud...).

>  	}
>
> @@ -3627,7 +3627,7 @@ static void show_cruft_commit(struct commit *commit, void *data)
>
>  static int cruft_include_check_obj(struct object *obj, void *data UNUSED)
>  {
> -	return !has_object_kept_pack(&obj->oid, IN_CORE_KEEP_PACKS);
> +	return !has_object_kept_pack(the_repository, &obj->oid, IN_CORE_KEEP_PACKS);

Here we don't know what pack "obj" is contained in, which makes sense
since this is a traversal callback, not something that is iterating over
the contents of a particular pack or similar. So using the_repository is
right here.

Although... should we be using to_pack->repo here over the_repository
(in builtin/pack-objects.c)? The rest of the code definitely does *not*
do that, but I think probably should.

>  static int cruft_include_check(struct commit *commit, void *data)
> diff --git a/diff.c b/diff.c
> index dceac20d18..1d483bdf37 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4041,7 +4041,8 @@ static int reuse_worktree_file(struct index_state *istate,
>  	 * objects however would tend to be slower as they need
>  	 * to be individually opened and inflated.
>  	 */
> -	if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid))
> +	if (!FAST_WORKING_DIRECTORY && !want_file &&
> +	    has_object_pack(the_repository, oid))
>  		return 0;
>
>  	/*
> diff --git a/list-objects.c b/list-objects.c
> index 985d008799..31236a8dc9 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -41,7 +41,8 @@ static void show_object(struct traversal_context *ctx,
>  {
>  	if (!ctx->show_object)
>  		return;
> -	if (ctx->revs->unpacked && has_object_pack(&object->oid))
> +	if (ctx->revs->unpacked && has_object_pack(ctx->revs->repo,
> +						   &object->oid))
>  		return;
>
>  	ctx->show_object(object, name, ctx->show_data);
> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 4fa9dfc771..d34ba9909a 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -1889,7 +1889,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git,
>  		bitmap_unset(result, i);
>
>  	for (i = 0; i < eindex->count; ++i) {
> -		if (has_object_pack(&eindex->objects[i]->oid))
> +		if (has_object_pack(the_repository, &eindex->objects[i]->oid))

Interesting. I think the_repository in practice is fine here, but I
might have expected something like bitmap_git->p->repo, or the
equivalent for the MIDX case.

So I was going to suggest something like:

    static struct repository *bitmap_repo(const struct bitmap_index *bitmap_git)
    {
        if (bitmap_is_midx(bitmap_git))
            return bitmap_git->midx->repo;
        return bitmap_git->pack->repo;
    }

and then rewriting this as:

    if (has_object_pack(bitmap_repo(bitmap_git), &eindex->objects[i]->oid))

, but we can't do that, because the MIDX structure does not know what
repository it belongs to, only the object_dir it resides in!

And I think that causes wrinkles earlier in your series that I didn't
think of at the time when reviewing, because it seems odd in retrospect
that, e.g. we have something like:

    load_multi_pack_index(the_repository->objects->odb->path, ...);

where we pass in the object_dir path directly, but other functions like
prepare_midx_pack() that take in a 'struct repository *'.

I wonder if we should be initializing the MIDX with a repository
pointer, so that it knows what repository it belongs to. I suspect that
we will still have to pass in a separate string indicating the
object_dir, likely because of the --object-dir quirk I mentioned
earlier.

But my main thought here is that we should be able to infer from a
'struct bitmap_index *' what repository it belongs to instead of using
'the_repository' here directly.

The rest all looks quite reasonable to me.

Thanks,
Taylor




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux