Re: [PATCH 4/4] submodule: port submodule subcommand 'summary' from shell to C

Shourya Shukla <shouryashukla.oo@xxxxxxxxx> · Wed, 15 Jul 2020 20:23:36 +0530

Hello Johannes!

On 12/07 02:46, Johannes Schindelin wrote:
> > > >>> +	cp_rev.git_cmd = 1;
> > > >>> +	argv_array_pushl(&cp_rev.args, "rev-parse", "-q", "--verify",
> > > >>> +			 argc ? argv[0] : "HEAD", NULL);
> > > >>
> > > >> Oy. Why not simply call `get_oid()`? No need to spawn a new process.
> > > >
> > > > Then everytime we need 'rev-parse', we simply call 'get_oid()'? That
> > > > will save us a ton of processes?
> > > >
> > > > But I think we do need to capture the output of 'git rev-parse --verify
> > > > ....' so I think it will backfire to use 'get_oid()' or am I just being
> > > > too dumb and not catching on something?
> > > >
> > >
> > > I'll leave this for others to answer.
> >
> > I will resolve this one after Dscho answers then.
> 
> The `rev-parse` command _would_ essentially call `get_oid()` and print the
> result (after converting it to hex, which you don't need, because the
> caller actually would parse the hex string anyway).
> 
> You can verify my claim by following the code:
> https://github.com/git/git/blob/v2.27.0/builtin/rev-parse.c#L956-L982 (it
> is slightly harder to follow because the `--verify` option makes sure that
> only one rev is shown, which means that the `get_oid_with_context()` call
> is separated from the corresponding `show_rev()` call).
> 
> So there really is no need to capture the output of that `rev-parse`
> command.

Yep.

> It is a different story, of course, when capturing the output of Git
> commands _that are run in a submodule_. Those currently still need to be
> spawned.

Oh alright, since we don't need to capture the output and rather use the
command for a particular functionality it offers to, in our just fetch
the OID of head, we use a helper function which the command actually
uses to make things happen instead of spawning a process. Is this
correct?

> > > >>> +
> > > >>> +	if (!capture_command(&cp_rev, &sb, 0)) {
> > > >>> +		strbuf_strip_suffix(&sb, "\n");
> > > >>> +		if (argc) {
> > > >>> +			argv++;
> > > >>> +			argc--;
> > > >>> +		}
> > > >>> +	} else if (!argc || !strcmp(argv[0], "HEAD")) {
> > > >>> +		/* before the first commit: compare with an empty tree */
> > > >>> +		struct stat st;
> > > >>> +		struct object_id oid;
> > > >>> +		if (fstat(0, &st) < 0 || index_fd(&the_index, &oid, 0, &st, 2,
> > > >>> +						  prefix, 3))
> > > >>> +			die("Unable to add %s to database", oid.hash);
> > > >>
> > > >> Umm. The original reads:
> > > >>
> > > >>                 # before the first commit: compare with an empty tree
> > > >>                 head=$(git hash-object -w -t tree --stdin </dev/null)
> > > >>
> > > >> It does not actually read from `stdin`. It reads from `/dev/null`,
> > > >> redirected to the input. And what it _actually_ does is to generate the
> > > >> OID of the empty tree.
> > > >>
> > > >> But we already _have_ the OID of the empty tree! It's
> > > >> `the_hash_algo->empty_tree`.
> > > >
> > > > I did not know this 'the_hash_algo'. I will use it. Thanks! :)
> > > >
> > > >> I hope that this is covered by the test suite. Please check that. The test
> > > >> would succeed with your version, but only because tests are run with
> > > >> `stdin` redirected from `/dev/null` by default.
> > > >
> > > > I guess yes. My work passed because the tests are written this way.
> > > >
> > > >>> +		strbuf_addstr(&sb, oid_to_hex(&oid));
> > > >>> +		if (argc) {
> > > >>> +			argv++;
> > > >>> +			argc--;
> > > >>> +		}
> > > >>> +	} else {
> > > >>> +		strbuf_addstr(&sb, "HEAD");
> > > >>> +	}
> > > >>
> > > >> The conversion to C would make for a fine excuse to simplify the logic.
> > > >
> > > > This was kind of like the 'shift' in shell. What equivalent do you
> > > > suggest?
> > > >
> > >
> > > I think that's just a general comment after the other comments found
> > > just above about simplifying things.
> >
> > Alright. But I do have to simplify the logic right?
> 
> You do not _have_ to. But it would make for a good opportunity to do that,
> I think, as the code is really hard to follow.
> 
> The idea here is actually not at all hard to understand, though: use
> the speficied rev (falling back to `HEAD`) to compare to, unless it is a
> yet-unborn `HEAD` in which case you compare to the empty tree.
> 
> It is very, very similar in spirit to the code in `do_pick_commit()` in
> `sequencer.c`:
> https://github.com/git/git/blob/v2.27.0/sequencer.c#L1765-L1774
> 
> The only difference is that you will also have to fall back to using
> `HEAD` if the argument in question turns out not to refer to a revision
> but is actually the first pathspec.

Alright. So do you suggest a code segment like this?

--------8<---------------------

module_summary() {
struct object_id *head_oid = NULL;
.....
.....
if (get_oid(argc ? argv[0] : "HEAD", head_oid)) {
			if (argc) {
			argv++;
			argc--;
		}
	} else if (!argc || !strcmp(argv[0], "HEAD")) {
		/* before the first commit: compare with an empty tree */
		oidcpy(head_oid, the_hash_algo->empty_tree);
		if (argc) {
			argv++;
			argc--;
		}
	} else {
		get_oid("HEAD", head_oid);
	}
.....
.....
}

*Passing head_oid as a pointer into the fucntion below*

compute_summary_module_list(..., struct object_id *head_oid,...) {
	.....
    .....
    if (head_oid) {
		argv_array_push(&diff_args, oid_to_hex(head_oid));
	}
    .....
    .....
}

----------->8-------------

When I try this, I get a:

    BUG: diff-lib.c:526: run_diff_index must be passed exactly one tree

And this occurs when we are inside the first if-statement. I do not
understand how we are passing multiple trees or making it seem as if
there are multiple trees? Also, even when we do not enter the first if,
we still get a segmentation fault. Why so?

Is this logic correct? I took some inspiration from the links you sent
above as well as some advice from Christian and Kaartic on this.