Re: [PATCH 4/8] rebase-helper --make-script: introduce a flag to recreate merges

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Mon, 29 Jan 2018 23:37:58 +0100 (STD)

Hi Junio,

On Tue, 23 Jan 2018, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@xxxxxx> writes:
> 
> > structure (similar in spirit to --preserve-merges, but with a
> > substantially less-broken design).
> > ...
> > @@ -2785,6 +2787,335 @@ void append_signoff(struct strbuf *msgbuf, int ignore_footer, unsigned flag)
> >  	strbuf_release(&sob);
> >  }
> >  
> > +struct labels_entry {
> > +	struct hashmap_entry entry;
> > +	char label[FLEX_ARRAY];
> > +};
> > +
> > +static int labels_cmp(const void *fndata, const struct labels_entry *a,
> > +		      const struct labels_entry *b, const void *key)
> > +{
> > +	return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
> > +}
> 
> label_oid() accesses state->labels hash using strihash() as the hash
> function, but the final comparison between the entries in the same
> hash buckets are done with case sensitivity.  It is unclear to me if
> that is what was intended, and why.

Heh... you were almost there with your analysis. strihash() is needed for
case-insensitive comparisons, and labels are... ref names.

So the idea (which I implemented only partially) was to make the labels
case-insensitive based on `ignore_case`.

I think the best way forward will be to copy
merge-recursive.c:path_hash().

> > +struct string_entry {
> > +	struct oidmap_entry entry;
> > +	char string[FLEX_ARRAY];
> > +};
> > +
> > +struct label_state {
> > +	struct oidmap commit2label;
> > +	struct hashmap labels;
> > +	struct strbuf buf;
> > +};
> > +
> > +static const char *label_oid(struct object_id *oid, const char *label,
> > +			     struct label_state *state)
> > +{
> > +	struct labels_entry *labels_entry;
> > +	struct string_entry *string_entry;
> > +	struct object_id dummy;
> > +	size_t len;
> > +	int i;
> > +
> > +	string_entry = oidmap_get(&state->commit2label, oid);
> > +	if (string_entry)
> > +		return string_entry->string;
> > +
> > +	/*
> > +	 * For "uninteresting" commits, i.e. commits that are not to be
> > +	 * rebased, and which can therefore not be labeled, we use a unique
> > +	 * abbreviation of the commit name. This is slightly more complicated
> > +	 * than calling find_unique_abbrev() because we also need to make
> > +	 * sure that the abbreviation does not conflict with any other
> > +	 * label.
> > +	 *
> > +	 * We disallow "interesting" commits to be labeled by a string that
> > +	 * is a valid full-length hash, to ensure that we always can find an
> > +	 * abbreviation for any uninteresting commit's names that does not
> > +	 * clash with any other label.
> > +	 */
> > +	if (!label) {
> > +		char *p;
> > +
> > +		strbuf_reset(&state->buf);
> > +		strbuf_grow(&state->buf, GIT_SHA1_HEXSZ);
> > +		label = p = state->buf.buf;
> > +
> > +		find_unique_abbrev_r(p, oid->hash, default_abbrev);
> > +
> > +		/*
> > +		 * We may need to extend the abbreviated hash so that there is
> > +		 * no conflicting label.
> > +		 */
> > +		if (hashmap_get_from_hash(&state->labels, strihash(p), p)) {
> > +			size_t i = strlen(p) + 1;
> > +
> > +			oid_to_hex_r(p, oid);
> > +			for (; i < GIT_SHA1_HEXSZ; i++) {
> > +				char save = p[i];
> > +				p[i] = '\0';
> > +				if (!hashmap_get_from_hash(&state->labels,
> > +							   strihash(p), p))
> > +					break;
> > +				p[i] = save;
> > +			}
> > +		}
> 
> If oid->hash required full 40-hex to disambiguate, then
> find-unique-abbrev would give 40-hex and we'd want the same "-<num>"
> suffix technique employed below to make it consistently unique.  I
> wonder if organizing the function this way ...
> 
> 	if (!label)
> 		label = oid-to-hex(oid);
> 
> 	if (label already exists or full oid) {
> 		make it unambiguous;
> 	}

It was hard to miss, I agree. The first arm of the if() is not trying to
make a label unambiguous by adding `-<num>`, but by extending the
abbreviated 40-hex until it is unique. (And we guarantee that there is a
solution, as we do not allow valid 40-hex strings to be used as label.)

The reason: if no `label` is provided, we want a valid raw (possibly
abbreviated) commit name. If `label` is provided, we want a valid ref name
(possibly made unique by appending `-<num>`).

Different beasts. Cannot be put into the same if() arm.

> A related tangent.  Does an auto-label given to "uninteresting"
> commit need to be visible to end users?

Yes. The reason for this is the `merge`/`reset` commands when *not*
rebasing cousins. Because in that case, we have to refer to commits that
we could not possibly have `label`ed, because they were never the current
revision. Therefore we cannot assign <name>-<num> labels, but have to use
the abbreviated commit hash.

> > +static int make_script_with_merges(struct pretty_print_context *pp,
> > +				   struct rev_info *revs, FILE *out,
> > +				   unsigned flags)
> > +{
> > + ...
> > +	int abbr = flags & TODO_LIST_ABBREVIATE_CMDS;
> > +	const char *p = abbr ? "p" : "pick", *l = abbr ? "l" : "label",
> > +		 *t = abbr ? "t" : "reset", *b = abbr ? "b" : "bud",
> > +		 *m = abbr ? "m" : "merge";
> 
> It would be easier to understand if these short named variables are
> reserved only for temporary use, not as constants.  It is not too
> much to spell 
> 
> 	fprintf(out, "%s onto\n", cmd_label);
> 
> than
> 
> 	fprintf(out, "%s onto\n", l);
> 
> and would save readers from head-scratching, wondering where the
> last assignment to variable "l" is.

Sure.

> > +	oidmap_init(&commit2todo, 0);
> > +	oidmap_init(&state.commit2label, 0);
> > +	hashmap_init(&state.labels, (hashmap_cmp_fn) labels_cmp, NULL, 0);
> > +	strbuf_init(&state.buf, 32);
> > +
> > +	if (revs->cmdline.nr && (revs->cmdline.rev[0].flags & BOTTOM)) {
> > +		struct object_id *oid = &revs->cmdline.rev[0].item->oid;
> > +		FLEX_ALLOC_STR(entry, string, "onto");
> > +		oidcpy(&entry->entry.oid, oid);
> > +		oidmap_put(&state.commit2label, entry);
> > +	}
> > +
> > +	/*
> > +	 * First phase:
> > +	 * - get onelines for all commits
> > +	 * - gather all branch tips (i.e. 2nd or later parents of merges)
> > +	 * - label all branch tips
> > +	 */
> 
> When an early part of a branch is merged and then the remaining part
> of the same branch is merged again, "branch tip" and "2nd or later
> parents of merges" would become different concepts.  The 2nd parent
> of an early merge is not among the branch tips.
> 
> For the purpose of the "recreate the topology" algorithm, I am
> imagining that you would need not just the tips but all the 2nd and
> subsequent parents of merges, and my quick skimming tells me that
> the following code grabs them correctly.

Yes. I have used an earlier iteration of this patch series to create the
merging-rebase of Git for Windows v2.16.0 and that has a couple of really
good corner cases to exercise this code.

Ciao,
Dscho