Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Sun, 12 Jul 2020 02:46:47 +0200 (CEST)

Hi Shourya,

On Mon, 6 Jul 2020, Shourya Shukla wrote:

> On 06/07 02:46, Kaartic Sivaraam wrote:
> > On 05-07-2020 23:04, Shourya Shukla wrote:
> > >> On Fri, 3 Jul 2020, Shourya Shukla wrote:
> > >>
> > >>> +static int module_summary(int argc, const char **argv, const char *prefix)
> > >>> +{
> > >>> +	struct summary_cb info = SUMMARY_CB_INIT;
> > >>> +	int cached = 0;
> > >>> +	int for_status = 0;
> > >>> +	int quiet = 0;
> > >>> +	int files = 0;
> > >>> +	int summary_limit = -1;
> > >>> +	struct child_process cp_rev = CHILD_PROCESS_INIT;
> > >>> +	struct strbuf sb = STRBUF_INIT;
> > >>> +	enum diff_cmd diff_cmd = DIFF_INDEX;
> > >>> +	int ret;
> > >>> +
> > >>> +	struct option module_summary_options[] = {
> > >>> +		OPT__QUIET(&quiet, N_("Suppress output of summarising submodules")),
> > >>> +		OPT_BOOL(0, "cached", &cached,
> > >>> +			 N_("Use the commit stored in the index instead of the submodule HEAD")),
> > >>> +		OPT_BOOL(0, "files", &files,
> > >>> +			 N_("To compare the commit in the index with that in the submodule HEAD")),
> > >>> +		OPT_BOOL(0, "for-status", &for_status,
> > >>> +			 N_("Skip submodules with 'ignore_config' value set to 'all'")),
> > >>> +		OPT_INTEGER('n', "summary-limit", &summary_limit,
> > >>> +			     N_("Limit the summary size")),
> > >>> +		OPT_END()
> > >>> +	};
> > >>> +
> > >>> +	const char *const git_submodule_helper_usage[] = {
> > >>> +		N_("git submodule--helper summary [<options>] [commit] [--] [<path>]"),
> > >>> +		NULL
> > >>> +	};
> > >>> +
> > >>> +	argc = parse_options(argc, argv, prefix, module_summary_options,
> > >>> +			     git_submodule_helper_usage, 0);
> > >>> +
> > >>> +	if (!summary_limit)
> > >>> +		return 0;
> > >>> +
> > >>> +	cp_rev.git_cmd = 1;
> > >>> +	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
> > >>> +			 argc ? argv[0] : "HEAD", NULL);
> > >>
> > >> Oy. Why not simply call `get_oid()`? No need to spawn a new process.
> > >
> > > Then everytime we need 'rev-parse', we simply call 'get_oid()'? That
> > > will save us a ton of processes?
> > >
> > > But I think we do need to capture the output of 'git rev-parse --verify
> > > ....' so I think it will backfire to use 'get_oid()' or am I just being
> > > too dumb and not catching on something?
> > >
> >
> > I'll leave this for others to answer.
>
> I will resolve this one after Dscho answers then.

The `rev-parse` command _would_ essentially call `get_oid()` and print the
result (after converting it to hex, which you don't need, because the
caller actually would parse the hex string anyway).

You can verify my claim by following the code:
https://github.com/git/git/blob/v2.27.0/builtin/rev-parse.c#L956-L982 (it
is slightly harder to follow because the `--verify` option makes sure that
only one rev is shown, which means that the `get_oid_with_context()` call
is separated from the corresponding `show_rev()` call).

So there really is no need to capture the output of that `rev-parse`
command.

It is a different story, of course, when capturing the output of Git
commands _that are run in a submodule_. Those currently still need to be
spawned.

> > >>> +
> > >>> +	if (!capture_command(&cp_rev, &sb, 0)) {
> > >>> +		strbuf_strip_suffix(&sb, "\n");
> > >>> +		if (argc) {
> > >>> +			argv++;
> > >>> +			argc--;
> > >>> +		}
> > >>> +	} else if (!argc || !strcmp(argv[0], "HEAD")) {
> > >>> +		/* before the first commit: compare with an empty tree */
> > >>> +		struct stat st;
> > >>> +		struct object_id oid;
> > >>> +		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
> > >>> +						  prefix, 3))
> > >>> +			die("Unable to add %s to database", oid.hash);
> > >>
> > >> Umm. The original reads:
> > >>
> > >>                 # before the first commit: compare with an empty tree
> > >>                 head=$(git hash-object -w -t tree --stdin </dev/null)
> > >>
> > >> It does not actually read from `stdin`. It reads from `/dev/null`,
> > >> redirected to the input. And what it _actually_ does is to generate the
> > >> OID of the empty tree.
> > >>
> > >> But we already _have_ the OID of the empty tree! It's
> > >> `the_hash_algo->empty_tree`.
> > >
> > > I did not know this 'the_hash_algo'. I will use it. Thanks! :)
> > >
> > >> I hope that this is covered by the test suite. Please check that. The test
> > >> would succeed with your version, but only because tests are run with
> > >> `stdin` redirected from `/dev/null` by default.
> > >
> > > I guess yes. My work passed because the tests are written this way.
> > >
> > >>> +		strbuf_addstr(&sb, oid_to_hex(&oid));
> > >>> +		if (argc) {
> > >>> +			argv++;
> > >>> +			argc--;
> > >>> +		}
> > >>> +	} else {
> > >>> +		strbuf_addstr(&sb, "HEAD");
> > >>> +	}
> > >>
> > >> The conversion to C would make for a fine excuse to simplify the logic.
> > >
> > > This was kind of like the 'shift' in shell. What equivalent do you
> > > suggest?
> > >
> >
> > I think that's just a general comment after the other comments found
> > just above about simplifying things.
>
> Alright. But I do have to simplify the logic right?

You do not _have_ to. But it would make for a good opportunity to do that,
I think, as the code is really hard to follow.

The idea here is actually not at all hard to understand, though: use
the speficied rev (falling back to `HEAD`) to compare to, unless it is a
yet-unborn `HEAD` in which case you compare to the empty tree.

It is very, very similar in spirit to the code in `do_pick_commit()` in
`sequencer.c`:
https://github.com/git/git/blob/v2.27.0/sequencer.c#L1765-L1774

The only difference is that you will also have to fall back to using
`HEAD` if the argument in question turns out not to refer to a revision
but is actually the first pathspec.

Ciao,
Johannes