Re: [PATCH v2 07/34] sequencer (rebase -i): add support for the 'fixup' and 'squash' commands

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Mon, 19 Dec 2016 17:58:10 +0100 (CET)

Hi Junio,

On Thu, 15 Dec 2016, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@xxxxxx> writes:
> 
> > diff --git a/sequencer.c b/sequencer.c
> > index f6e20b142a..271c21581d 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -45,6 +45,35 @@ static GIT_PATH_FUNC(rebase_path_todo, "rebase-merge/git-rebase-todo")
> >   */
> >  static GIT_PATH_FUNC(rebase_path_done, "rebase-merge/done")
> >  /*
> > + * The commit message that is planned to be used for any changes that
> > + * need to be committed following a user interaction.
> > + */
> > +static GIT_PATH_FUNC(rebase_path_message, "rebase-merge/message")
> > +/*
> > + * The file into which is accumulated the suggested commit message for
> > + * squash/fixup commands. When the first of a series of squash/fixups
> 
> The same comment as 03/34 applies here, regarding blank line to
> separate logical unit.

Same rationale here: the path functions are one big continuous block, with
comments obviously applying to the immediately following line only.

> > +static int update_squash_messages(enum todo_command command,
> > +		struct commit *commit, struct replay_opts *opts)
> > +{
> > +	struct strbuf buf = STRBUF_INIT;
> > +	int count, res;
> > +	const char *message, *body;
> > +
> > +	if (file_exists(rebase_path_squash_msg())) {
> > +		char *p, *p2;
> > +
> > +		if (strbuf_read_file(&buf, rebase_path_squash_msg(), 2048) <= 0)
> > +			return error(_("could not read '%s'"),
> > +				rebase_path_squash_msg());
> > +
> > +		if (buf.buf[0] != comment_line_char ||
> > +		    !skip_prefix(buf.buf + 1, " This is a combination of ",
> > +				 (const char **)&p))
> > +			return error(_("unexpected 1st line of squash message:"
> > +				       "\n\n\t%.*s"),
> > +				     (int)(strchrnul(buf.buf, '\n') - buf.buf),
> > +				     buf.buf);
> > +		count = strtol(p, &p2, 10);
> > +
> > +		if (count < 1 || *p2 != ' ')
> > +			return error(_("invalid 1st line of squash message:\n"
> > +				       "\n\t%.*s"),
> > +				     (int)(strchrnul(buf.buf, '\n') - buf.buf),
> > +				     buf.buf);
> > +
> > +		sprintf((char *)p, "%d", ++count);
> 
> Do we know the area pointed at p (which is inside buf) long enough
> not to overflow?

Yes, we know that: p2 points to the byte after the parsed number, and said
byte is a space (ASCII 0x20), as verified by the if() above.

> If the original were 9 and you incremented to get 10, you would need one
> extra byte.

Exactly. That extra byte (if needed) is 0x20, as verified above, and can
be overwritten.

If that extra byte (to which p2 points) is *not* overwritten, i.e. if the
new count requires the same amount of space in decimal representation as
the previous count, it is now NUL, as tested here:

> > +		if (!*p2)
> > +			*p2 = ' ';
> > +		else {
> > +			*(++p2) = 'c';
> 
> p2 points into buf; do we know this increment does not step beyond
> its end?  What is the meaning of a letter 'c' here (I do not see a
> corresponding one in the scripted update_squash_messages)?

This clause is entered only when the space needed by the previous count
was not sufficient to hold the new count, at which point we know that not
only the space after the old count was overwritten, but also the 'c' of
the "commits" string.

Therefore, this clause reinstates the 'c' and inserts the space, so that
everything is groovy again:

> > +			strbuf_insert(&buf, p2 - buf.buf, " ", 1);
> > +		}

Having said that, I just realized that this code was only safe as long as
the squash messages were not localized.

I changed the code to imitate more closely what the shell script does. It
made the code a little more verbose, but it should work better as a
consequence, and I am pretty certain you will find it easier to verify
that it is correct.

> > +	}
> > +	else {
> 
> Style: "} else {" (I won't repeat this, as it will become too noisy).

It is not necessary to repeat this, either, as I took the first such
comment as a strong hint to look at the entire patch series and fix it
appropriately.

> 
> > +		unsigned char head[20];
> > +		struct commit *head_commit;
> > +		const char *head_message, *body;
> > +
> > +		if (get_sha1("HEAD", head))
> > +			return error(_("need a HEAD to fixup"));
> > +		if (!(head_commit = lookup_commit_reference(head)))
> > +			return error(_("could not read HEAD"));
> > +		if (!(head_message = get_commit_buffer(head_commit, NULL)))
> > +			return error(_("could not read HEAD's commit message"));
> > +
> > +		body = strstr(head_message, "\n\n");
> > +		if (!body)
> > +			body = "";
> > +		else
> > +			body = skip_blank_lines(body + 2);
> 
> I think I saw you used a helper function find_commit_subject() to do
> the above in an earlier patch for "edit" in this series.  Would it
> make this part (and another one for "commit" we have after this
> if/else) shorter?

Right, and by using the helper function, I also fixed a bug handling funny
commit objects that had more than one empty line separating header from
the message.

Of course, there is a third location in sequencer.c (predating my patches)
that uses the very same idiom. Yet another patch added to this growing
patch series...

> >  static int do_pick_commit(enum todo_command command, struct commit *commit,
> > -		struct replay_opts *opts)
> > +		struct replay_opts *opts, int final_fixup)
> >  {
> > +	int edit = opts->edit, cleanup_commit_message = 0;
> > +	const char *msg_file = edit ? NULL : git_path_merge_msg();
> >  	unsigned char head[20];
> >  	struct commit *base, *next, *parent;
> >  	const char *base_label, *next_label;
> >  	struct commit_message msg = { NULL, NULL, NULL, NULL };
> >  	struct strbuf msgbuf = STRBUF_INIT;
> > -	int res, unborn = 0, allow;
> > +	int res, unborn = 0, amend = 0, allow;
> >  
> >  	if (opts->no_commit) {
> >  		/*
> > @@ -749,7 +885,7 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
> >  	else
> >  		parent = commit->parents->item;
> >  
> > -	if (opts->allow_ff &&
> > +	if (opts->allow_ff && !is_fixup(command) &&
> >  	    ((parent && !hashcmp(parent->object.oid.hash, head)) ||
> >  	     (!parent && unborn)))
> >  		return fast_forward_to(commit->object.oid.hash, head, unborn, opts);
> > @@ -813,6 +949,28 @@ static int do_pick_commit(enum todo_command command, struct commit *commit,
> >  		}
> >  	}
> >  
> > +	if (is_fixup(command)) {
> > +		if (update_squash_messages(command, commit, opts))
> > +			return -1;
> > +		amend = 1;
> > +		if (!final_fixup)
> > +			msg_file = rebase_path_squash_msg();
> > +		else if (file_exists(rebase_path_fixup_msg())) {
> > +			cleanup_commit_message = 1;
> > +			msg_file = rebase_path_fixup_msg();
> > +		}
> > +		else {
> > +			const char *dest = git_path("SQUASH_MSG");
> > +			unlink(dest);
> > +			if (copy_file(dest, rebase_path_squash_msg(), 0666))
> > +				return error(_("could not rename '%s' to '%s'"),
> > +					     rebase_path_squash_msg(), dest);
> 
> Perhaps an error from unlink(dest) before copy_file() should also
> result in an error return?

No, because the most likely cause for that `unlink()` to fail is that the
destination does not exist, and it is fine if it does not exist yet.

No worries, copy_file() will fail if we could not remove any existing file
and we still error out in that case.

> > +			unlink(git_path("MERGE_MSG"));
> 
> Errors from this and other unlink() that emulates "rm -f" were
> unchecked in the scripted original, so not checking for errors is
> not a regression.  I would check for an error if I were writing
> this, however, because I know I would forget updating these after I
> am done with the series.

The problem is that these files do not necessarily exist. We only unlink()
them to make sure that they do not exist afterwards.

In any case, I am still more interested in a faithful translation than
already starting to improve the code at this stage.

> > @@ -1475,6 +1660,21 @@ static int do_exec(const char *command_line)
> >  	return status;
> >  }
> >  
> > +static int is_final_fixup(struct todo_list *todo_list)
> > +{
> > +	int i = todo_list->current;
> > +
> > +	if (!is_fixup(todo_list->items[i].command))
> > +		return 0;
> > +
> > +	while (++i < todo_list->nr)
> > +		if (is_fixup(todo_list->items[i].command))
> > +			return 0;
> > +		else if (todo_list->items[i].command < TODO_NOOP)
> > +			break;
> 
> What follows NOOP are comment and "drop" which is another comment in
> disguise, so this one is excluding all the no-op commands in various
> shapes, which makes sense but is clear only to a reader who bothered
> to go back to "enum todo_command" and checked that fact.  If a check
> for "is it one of the no-op commands?" appears only here, a single
> liner comment may be sufficient (but necessary) to help readers.
> Otherwise a single-liner helper function (similar to is_fixup() you
> have) with a descriptive name would be better than a single liner
> comment.

True. I introduced is_noop().

Ciao,
Dscho