Re: [PATCH 4/8] rebase-helper --make-script: introduce a flag to recreate merges

Junio C Hamano <gitster@xxxxxxxxx> · Tue, 23 Jan 2018 12:03:30 -0800

Johannes Schindelin <johannes.schindelin@xxxxxx> writes:

> The sequencer just learned a new commands intended to recreate branch

s/a //;

> structure (similar in spirit to --preserve-merges, but with a
> substantially less-broken design).
> ...
> @@ -2785,6 +2787,335 @@ void append_signoff(struct strbuf *msgbuf, int ignore_footer, unsigned flag)
>  	strbuf_release(&sob);
>  }
>  
> +struct labels_entry {
> +	struct hashmap_entry entry;
> +	char label[FLEX_ARRAY];
> +};
> +
> +static int labels_cmp(const void *fndata, const struct labels_entry *a,
> +		      const struct labels_entry *b, const void *key)
> +{
> +	return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
> +}

label_oid() accesses state->labels hash using strihash() as the hash
function, but the final comparison between the entries in the same
hash buckets are done with case sensitivity.  It is unclear to me if
that is what was intended, and why.

> +struct string_entry {
> +	struct oidmap_entry entry;
> +	char string[FLEX_ARRAY];
> +};
> +
> +struct label_state {
> +	struct oidmap commit2label;
> +	struct hashmap labels;
> +	struct strbuf buf;
> +};
> +
> +static const char *label_oid(struct object_id *oid, const char *label,
> +			     struct label_state *state)
> +{
> +	struct labels_entry *labels_entry;
> +	struct string_entry *string_entry;
> +	struct object_id dummy;
> +	size_t len;
> +	int i;
> +
> +	string_entry = oidmap_get(&state->commit2label, oid);
> +	if (string_entry)
> +		return string_entry->string;
> +
> +	/*
> +	 * For "uninteresting" commits, i.e. commits that are not to be
> +	 * rebased, and which can therefore not be labeled, we use a unique
> +	 * abbreviation of the commit name. This is slightly more complicated
> +	 * than calling find_unique_abbrev() because we also need to make
> +	 * sure that the abbreviation does not conflict with any other
> +	 * label.
> +	 *
> +	 * We disallow "interesting" commits to be labeled by a string that
> +	 * is a valid full-length hash, to ensure that we always can find an
> +	 * abbreviation for any uninteresting commit's names that does not
> +	 * clash with any other label.
> +	 */
> +	if (!label) {
> +		char *p;
> +
> +		strbuf_reset(&state->buf);
> +		strbuf_grow(&state->buf, GIT_SHA1_HEXSZ);
> +		label = p = state->buf.buf;
> +
> +		find_unique_abbrev_r(p, oid->hash, default_abbrev);
> +
> +		/*
> +		 * We may need to extend the abbreviated hash so that there is
> +		 * no conflicting label.
> +		 */
> +		if (hashmap_get_from_hash(&state->labels, strihash(p), p)) {
> +			size_t i = strlen(p) + 1;
> +
> +			oid_to_hex_r(p, oid);
> +			for (; i < GIT_SHA1_HEXSZ; i++) {
> +				char save = p[i];
> +				p[i] = '\0';
> +				if (!hashmap_get_from_hash(&state->labels,
> +							   strihash(p), p))
> +					break;
> +				p[i] = save;
> +			}
> +		}

If oid->hash required full 40-hex to disambiguate, then
find-unique-abbrev would give 40-hex and we'd want the same "-<num>"
suffix technique employed below to make it consistently unique.  I
wonder if organizing the function this way ...

	if (!label)
		label = oid-to-hex(oid);

	if (label already exists or full oid) {
		make it unambiguous;
	}

... allows the resulting code easier to understand and manage.

A related tangent.  Does an auto-label given to "uninteresting"
commit need to be visible to end users?  I doubted it and that is
why I said oid-to-hex in the above, but if it is given to end users,
use of find-unique-abbrev-r is perfectly fine.

> +static int make_script_with_merges(struct pretty_print_context *pp,
> +				   struct rev_info *revs, FILE *out,
> +				   unsigned flags)
> +{
> + ...
> +	int abbr = flags & TODO_LIST_ABBREVIATE_CMDS;
> +	const char *p = abbr ? "p" : "pick", *l = abbr ? "l" : "label",
> +		 *t = abbr ? "t" : "reset", *b = abbr ? "b" : "bud",
> +		 *m = abbr ? "m" : "merge";

It would be easier to understand if these short named variables are
reserved only for temporary use, not as constants.  It is not too
much to spell 

	fprintf(out, "%s onto\n", cmd_label);

than

	fprintf(out, "%s onto\n", l);

and would save readers from head-scratching, wondering where the
last assignment to variable "l" is.

> +
> +	oidmap_init(&commit2todo, 0);
> +	oidmap_init(&state.commit2label, 0);
> +	hashmap_init(&state.labels, (hashmap_cmp_fn) labels_cmp, NULL, 0);
> +	strbuf_init(&state.buf, 32);
> +
> +	if (revs->cmdline.nr && (revs->cmdline.rev[0].flags & BOTTOM)) {
> +		struct object_id *oid = &revs->cmdline.rev[0].item->oid;
> +		FLEX_ALLOC_STR(entry, string, "onto");
> +		oidcpy(&entry->entry.oid, oid);
> +		oidmap_put(&state.commit2label, entry);
> +	}
> +
> +	/*
> +	 * First phase:
> +	 * - get onelines for all commits
> +	 * - gather all branch tips (i.e. 2nd or later parents of merges)
> +	 * - label all branch tips
> +	 */

When an early part of a branch is merged and then the remaining part
of the same branch is merged again, "branch tip" and "2nd or later
parents of merges" would become different concepts.  The 2nd parent
of an early merge is not among the branch tips.

For the purpose of the "recreate the topology" algorithm, I am
imagining that you would need not just the tips but all the 2nd and
subsequent parents of merges, and my quick skimming tells me that
the following code grabs them correctly.