Re: [PATCH 07/16] mktree: use read_index_info to read stdin lines

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 11 Jun 2024 19:11:59 -0700

"Victoria Dye via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Victoria Dye <vdye@xxxxxxxxxx>
>
> Replace the custom input parsing of 'mktree' with 'read_index_info()', which
> handles not only the 'ls-tree' output format it already handles but also the
> other formats compatible with 'update-index'.

Yay.

> This lends some consistency
> across the commands (avoiding the need for two similar implementations for
> input parsing) and adds flexibility to mktree.
>
> Update 'Documentation/git-mktree.txt' to reflect the more permissive input
> format.

Nice.
>  DESCRIPTION
>  -----------
> -Reads standard input in non-recursive `ls-tree` output format, and creates
> -a tree object.  The order of the tree entries is normalized by mktree so
> -pre-sorting the input is not required.  The object name of the tree object
> -built is written to the standard output.
> +Reads entry information from stdin and creates a tree object from those entries.
> +The object name of the tree object built is written to the standard output.

pre-sorting is now required?  Ah, such details are left to the
section dedicated for the input format.  Makes sense.

The line is getting overly long (the first line now is exactly
80-columns); wrapping them to leave a bit of room to grow, like
at around 72-76 columns, would be appreciated.

> +INPUT FORMAT
> +------------
> +Tree entries may be specified in any of the formats compatible with the
> +`--index-info` option to linkgit:git-update-index[1]. The order of the tree
> +entries is normalized by `mktree` so pre-sorting the input by path is not
> +required.

OK.  We might want to split the description of the three-formats
into a separate file and include it in here and in the original (I'd
certainly insist doing so if we had three places that want to refer
to it), but we have only two so let's just remember to do so when we
may want to add the third place in the future.

> diff --git a/builtin/mktree.c b/builtin/mktree.c
> index 15bd908702a..5530257252d 100644
> --- a/builtin/mktree.c
> +++ b/builtin/mktree.c
> @@ -6,6 +6,7 @@
>  #include "builtin.h"
>  #include "gettext.h"
>  #include "hex.h"
> +#include "index-info.h"
>  #include "quote.h"
>  #include "strbuf.h"
>  #include "tree.h"
> @@ -93,123 +94,80 @@ static const char *mktree_usage[] = {
>  	NULL
>  };
>  
> -static void mktree_line(char *buf, int nul_term_line, int allow_missing,
> -			struct tree_entry_array *arr)
> +struct mktree_line_data {
> +	struct tree_entry_array *arr;
> +	int allow_missing;
> +};
> +
> +static int mktree_line(unsigned int mode, struct object_id *oid,
> +		       enum object_type obj_type, int stage UNUSED,
> +		       const char *path, void *cbdata)
>  {
> +	struct mktree_line_data *data = cbdata;
> +	enum object_type mode_type = object_type(mode);
>  	struct object_info oi = OBJECT_INFO_INIT;
> +	enum object_type parsed_obj_type;
>  
> +	if (obj_type && mode_type != obj_type)
> +		die("object type (%s) doesn't match mode type (%s)",
> +		    type_name(obj_type), type_name(mode_type));
>  
> +	oi.typep = &parsed_obj_type;
>  
> +	if (oid_object_info_extended(the_repository, oid, &oi,
>  				     OBJECT_INFO_LOOKUP_REPLACE |
>  				     OBJECT_INFO_QUICK |
>  				     OBJECT_INFO_SKIP_FETCH_OBJECT) < 0)
> +		parsed_obj_type = -1;
>  
> +	if (parsed_obj_type < 0) {
> +		if (data->allow_missing || S_ISGITLINK(mode)) {
> +			; /* no problem - missing objects & submodules are presumed to be of the right type */

Overlong line?

>  		} else {
> +			die("entry '%s' object %s is unavailable", path, oid_to_hex(oid));
>  		}

Each side of if/else has only a single statement block that does not
want {braces} around it.  I wonder if flipping the polarity makes it
easier to follow the logic flow:

		if (!data->allow_missing && !S_ISGITLINK(mode))
			die("...");

I wonder if we even want to do the oid_object_info_extended() when
we are expecting to see a gitlink.  We do not expect to have the
commit in our history (as it is part of the history of a submodule,
which is from a separate project), so even if we found such an
object in our object database, we do not want to do anything with
the information we learn about the object.

So I am wondering if the whole cascade should read more like

	if (S_ISGITILNK(mode)) {
		... anything goes ...
	} else if (oid_object_info_extended(...) < 0 &&
		   !data->allow_missing) {
        	... not found ...
	} else if (parsed_obj_type != mode_type) {
        	... found something different from what we expected ...
	}

The main loop, thanks to read_index_info() refactoring, got really
easier to read, i.e. compact and clear.