Re: [PATCH v2 4/6] path-walk: allow consumer to specify object types

Taylor Blau <me@xxxxxxxxxxxx> · Thu, 21 Nov 2024 17:44:18 -0500

On Sat, Nov 09, 2024 at 07:41:10PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/path-walk.c b/path-walk.c
> index 24cf04c1e7d..2ca08402367 100644
> --- a/path-walk.c
> +++ b/path-walk.c
> @@ -98,6 +98,10 @@ static int add_children(struct path_walk_context *ctx,
>  		if (S_ISGITLINK(entry.mode))
>  			continue;
>
> +		/* If the caller doesn't want blobs, then don't bother. */
> +		if (!ctx->info->blobs && type == OBJ_BLOB)
> +			continue;
> +

I was going to ask why we're not reusing the existing property of the
rev_info struct to specify what object type(s) we do/don't want, but the
answer is obvious: we don't want that to change the behavior of the
commit-level walk which is used to seed the actual path walk based on
its results.

However, it would be kind of nice to have a single place to specify how
you want to traverse objects, and using the rev_info struct seems like a
good choice there since you can naively ask Git to parse command-line
arguments like --blobs, --trees, --objects, etc. and set the appropriate
bits.

I wonder if it might make sense to do something like:

    unsigned int tmp_blobs = revs->blob_objects;
    unsigned int tmp_trees = revs->tree_objects;
    unsigned int tmp_tags = revs->tag_objects;

    if (prepare_revision_walk(revs))
        die(_("failed to setup revision walk"));

    /* commit-level walk */

    revs->blob_objects = tmp_blobs;
    revs->tree_objects = tmp_trees;
    revs->tag_objects = tmp_tags;

    /* path-level walk */

I don't have strong feelings about it, but it feels like this approach
could be cleaner from a caller's perspective. But I can see an argument
to the contrary that it does introduce some awkwardness with the
dual-use of those fields.

Thanks,
Taylor