Re: [PATCH 13/13] Build in merge

Miklos Vajna <vmiklos@xxxxxxxxxxxxxx> · Tue, 1 Jul 2008 04:13:17 +0200

On Sun, Jun 29, 2008 at 10:44:43PM -0700, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> > +static void save_state(void)
> > +{
> > +	int fd;
> > +	struct child_process stash;
> > +	const char *argv[] = {"stash", "create", NULL};
> > +
> > +	fd = open(git_path("MERGE_STASH"), O_WRONLY | O_CREAT, 0666);
> > +	if (fd < 0)
> > +		die("Could not write to %s", git_path("MERGE_STASH"));
> > +	memset(&stash, 0, sizeof(stash));
> > +	stash.argv = argv;
> > +	stash.out = fd;
> > +	stash.git_cmd = 1;
> > +	run_command(&stash);
> > +}
> 
> I first thought "heh, that's clever" until I noticed that we use "stash
> create" with "stash apply" these days instead of cpio for this.  I suspect
> that we can do away without leaving the stash in this temporary file, but
> that comment applies to the scripted version as well.

We can. I just did it. ;-)

> By the way, it would be consistent to name counterpart to dropsave in the
> scripted version as "drop_save" if you use "save_state" and "restore_state".

OK, renamed.

> 
> > +static void reset_hard(unsigned const char *sha1, int verbose)
> > +{
> > +	struct tree *tree;
> > +	struct unpack_trees_options opts;
> > +	struct tree_desc t;
> > +
> > +	memset(&opts, 0, sizeof(opts));
> > +	opts.head_idx = -1;
> > +	opts.src_index = &the_index;
> > +	opts.dst_index = &the_index;
> > +	opts.update = 1;
> > +	opts.reset = 1;
> > +	if (verbose)
> > +		opts.verbose_update = 1;
> > +
> > +	tree = parse_tree_indirect(sha1);
> > +	if (!tree)
> > +		die("failed to unpack %s tree object", sha1_to_hex(sha1));
> > +	parse_tree(tree);
> > +	init_tree_desc(&t, tree->buffer, tree->size);
> > +	if (unpack_trees(1, &t, &opts))
> > +		exit(128); /* We've already reported the error, finish dying */
> > +}
> 
> Isn't this trashing all the cached stat info from the index?  If this is
> emulating "reset --hard", it also should set opts.merge and do
> oneway_merge, after reading the current index in, I think.  Resetting the
> index and the working tree is not particularly performance critical part,
> but trashing the cached stat info would hurt the performance of everything
> that reads the index after this function returns quite badly.  I suspect
> that you might be better off forking the real thing (reset --hard) if you
> cannot get it right here.

I just realized that builtin-reset forks read-tree as well, so I did
almost the same.

> > +/* Get the name for the merge commit's message. */
> > +static void merge_name(const char *remote, struct strbuf *msg)
> > ...
> > +	strbuf_init(&buf, 0);
> > +	strbuf_addstr(&buf, "refs/heads/");
> > +	strbuf_addstr(&buf, remote);
> > +	dwim_ref(buf.buf, buf.len, branch_head, &ref);
> > +	if (!hashcmp(remote_head->sha1, branch_head)) {
> > +		strbuf_addf(msg, "%s\t\tbranch '%s' of .\n",
> > +			sha1_to_hex(branch_head), remote);
> > +		return;
> > +	}
> 
> Hmm, why not resolve_ref() so that it does not dwim at all?  The point of
> the code is so that you can be confident that 'blah' *is* a local branch
> when you say "branch 'blah'".

Yes. I'm now using resolve_ref().

> > +	/* See if remote matches <name>~<number>, or <name>^ */
> > +	ptr = strrchr(remote, '^');
> > +	if (ptr && ptr[1] == '\0') {
> > +		len = strlen(remote);
> > +		while ((ptr = (char *)memrchr(remote, '^', len)))
> > +			if (ptr && ptr[1] == '\0')
> > +				len = ptr - remote - 1;
> > +			else
> > +				break;
> 
> That's a funny way to say:
> 
> 	for (len = 0, ptr = remote + strlen(remote);
>              remote < ptr && ptr[-1] == '^';
>              ptr--)
> 		len++;

Ah, and this way I don't need memrchr(), which was pointed out to be
problemtic on Cygwin.

> 
> > +	if (len) {
> > +		struct strbuf truname = STRBUF_INIT;
> > +		strbuf_addstr(&truname, remote);
> > +		strbuf_setlen(&truname, len);
> > +		if (dwim_ref(truname.buf, truname.len, buf_sha, &ref)) {
> > +			strbuf_addf(msg,
> > +				"%s\t\tbranch '%s' (early part) of .\n",
> > +				sha1_to_hex(remote_head->sha1), truname.buf);
> > +			return;
> 
> Isn't this wrong?  Giving "v1.5.6~20" to this code will strip ~20 and make
> remote = "v1.5.6", to which dwim_ref() will happily say Ok, and you end up
> saying "branch 'v1.5.6' (early part)", don't you?

Right. Now I do

        strbuf_addstr(&truname, "refs/heads/");

Before appending the remote name to truname, so that should exclude
tags.

> > +static int read_tree_trivial(unsigned char *common, unsigned char *head,
> > +	unsigned char *one)
> > +{
> > +	int i, nr_trees = 0;
> > +	struct tree *trees[MAX_UNPACK_TREES];
> > +	struct tree_desc t[MAX_UNPACK_TREES];
> > +	struct unpack_trees_options opts;
> > +
> > +	memset(&opts, 0, sizeof(opts));
> > +	opts.head_idx = -1;
> 
> Is this the correct head_idx value for this three-way merge?  I think it
> should be 2 but please double check.

Yes, you are right. I just checked builtin-read-tree and it's 2, not -1.

> > +static int commit_tree_trivial(const char *msg, unsigned const char *tree,
> > +		struct commit_list *parents, unsigned char *ret)
> > +{
> >  ...
> > +}
> 
> We may want to have another patch before this one to abstract most of
> cmd_commit_tree() out, perhaps?

Done. And now builtin-merge uses commit_tree() as well.

> > +int cmd_merge(int argc, const char **argv, const char *prefix)
> > ...
> > +	/*
> > +	 * This could be traditional "merge <msg> HEAD <commit>..."  and
> > +	 * the way we can tell it is to see if the second token is HEAD,
> > +	 * but some people might have misused the interface and used a
> > +	 * committish that is the same as HEAD there instead.
> > +	 * Traditional format never would have "-m" so it is an
> > +	 * additional safety measure to check for it.
> > +	 */
> > +	strbuf_init(&buf, 0);
> > +	if (argc > 1) {
> > +		unsigned char second_sha1[20];
> > +
> > +		if (get_sha1(argv[1], second_sha1))
> > +			die("Not a valid ref: %s", argv[1]);
> > +		second_token = lookup_commit_reference_gently(second_sha1, 0);
> > +		if (!second_token)
> > +			die("'%s' is not a commit", argv[1]);
> 
> Interesting.
> 
> This _superficially_ is quite wrong, because the purpose of this part of
> the code is to tell if we got old-style invocation, and we should not
> barfing merely because what we got is _not_ old-style.  If it is not
> old-style, then it would be new-style, and the logic to tell if it is
> old-style should ideally not have much knowledge about the new-style
> invocation to say "hey, that's an incocrrect new-style invocation".  By
> the way, this part should probably be in a separate function:
> 
> 	static int is_old_style_invocation(int ac, const char **gv);

OK, I broke out is_old_style_invocation() from cmd_merge().

> Old-style invocation of "git merge" (primarily by "git pull") was
> to call it as:
> 
> 	git merge "message here" HEAD $commit1 $commit2...
> 
> and it checks the second token ("HEAD" in the above, but people can misuse
> the interface to name the current branch name).  If the second token is
> not a ref that resolves to a commit, all you know is that this is _not_ an
> old-style invocation, and calling the program with new-style is not a
> crime.
> 
> The only reason this is wrong only superficially is because new style
> invocation would always be:
> 
> 	git merge [options] $commit1 $commit2...
> 
> after stripping the options, and these seemingly wrong die() will complain
> when you try to create an Octopus with the new-style syntax and the
> parameter given as the second remote parent is not a commit.  So the logic
> is wrong, the fact that the user gets the same error message for incorrect
> old-style invocation (perhaps "git merge <msg> HAED $commit") and
> incorrect new-style invocation "git merge $commit1 $nonsense" is just an
> accident, and the end result does not hurt, but asks for a "Huh? why does
> it check and complain only the second parent here but not the first one?".
> 
> It is interesting, but feels quite dirty.

Now if the second token is a valid SHA1 then I die() if it's not a
commit, but otherwise I just assume it's a new-style invocation.

> > +	if (!have_message && second_token &&
> > +		!hashcmp(second_token->object.sha1, head)) {
> 
> You need to know that resolve_ref() cleared head[] when head_invalid is
> true when reading this code to notice that, unlike the previous round of
> this patch, it is Ok not to check head_invalid is fine here.  I somehow
> feel it is an unnecessary optimization/obfuscation.
> 
> But once you have "is_old_style_invocation" suggested earlier, this part
> would look much cleaner and the above comment would become unnecessary.

Yes, now it's just:

       if (!have_message && is_old_style_invocation(argc, argv)) {

> 
> > +	for (i = 0; i < argc; i++) {
> > +		struct object *o;
> > +
> > +		o = peel_to_type(argv[i], 0, NULL, OBJ_COMMIT);
> > +		if (!o)
> > +			die("%s - not something we can merge", argv[i]);
> > +		remotes = &commit_list_insert(lookup_commit(o->sha1),
> > +			remotes)->next;
> > +
> > +		strbuf_addf(&buf, "GITHEAD_%s", sha1_to_hex(o->sha1));
> > +		setenv(buf.buf, argv[i], 1);
> > +		strbuf_reset(&buf);
> > +	}
> > +
> > +		o = peel_to_type(sha1_to_hex(remoteheads->item->object.sha1),
> > +			0, NULL, OBJ_COMMIT);
> > +		if (!o)
> > +			return 0;
> > +
> > +		if (checkout_fast_forward(head, remoteheads->item->object.sha1))
> > +			return 0;
> 
> When o does not peel well, or checkout_fast_forward() returns failure,
> that would be a failure case, wouldn't it?  Why return 0?
> 
> Maybe you misread "exit" in shell scripts?  It does not mean exit(0); it
> means "exit with the same exit status as the last command".  So
> 
> 	new_head=$(git rev-parse ...) &&
>         git read-tree -m -u ... &&
>         finish || exit
> 
> will exit non-zero if any of the commands chained by && fails.

Thanks, that was the case. I thought "false || exit" exits with status
code 0.

> > +	/*
> > +	 * At this point, we need a real merge.  No matter what strategy
> > +	 * we use, it would operate on the index, possibly affecting the
> > +	 * working tree, and when resolved cleanly, have the desired
> > +	 * tree in the index -- this means that the index must be in
> > +	 * sync with the head commit.  The strategies are responsible
> > +	 * to ensure this.
> > +	 */
> > +	if (use_strategies.nr != 1) {
> > +		/*
> > +		 * Stash away the local changes so that we can try more
> > +		 * than one.
> > +		 */
> > +		save_state();
> > +		single_strategy = 0;
> > +	} else {
> > +		unlink(git_path("MERGE_STASH"));
> > +		single_strategy = 1;
> 
> I think s/single_strategy/(use_strategies.nr == 1)/ in the remainder of the
> code would be taking advantage of working in C ;-)

I dropped single_strategy.

> 
> > +		if (ret) {
> > +			/*
> > +			 * The backend exits with 1 when conflicts are
> > +			 * left to be resolved, with 2 when it does not
> > +			 * handle the given merge at all.
> > +			 */
> > +			if (ret == 1) {
> 
> Probably from here til ...
> 
> > +				int cnt = 0;
> > ...
> > +				cnt += count_unmerged_entries();
> 
> ... here should be a separate "evaluate_result()" function.

Done.
Attachment:
pgpYMIKIGa7XK.pgp

Description: PGP signature