Hi Junio, On Tue, 23 Jan 2018, Junio C Hamano wrote: > Johannes Schindelin <johannes.schindelin@xxxxxx> writes: > > > structure (similar in spirit to --preserve-merges, but with a > > substantially less-broken design). > > ... > > @@ -2785,6 +2787,335 @@ void append_signoff(struct strbuf *msgbuf, int ignore_footer, unsigned flag) > > strbuf_release(&sob); > > } > > > > +struct labels_entry { > > + struct hashmap_entry entry; > > + char label[FLEX_ARRAY]; > > +}; > > + > > +static int labels_cmp(const void *fndata, const struct labels_entry *a, > > + const struct labels_entry *b, const void *key) > > +{ > > + return key ? strcmp(a->label, key) : strcmp(a->label, b->label); > > +} > > label_oid() accesses state->labels hash using strihash() as the hash > function, but the final comparison between the entries in the same > hash buckets are done with case sensitivity. It is unclear to me if > that is what was intended, and why. Heh... you were almost there with your analysis. strihash() is needed for case-insensitive comparisons, and labels are... ref names. So the idea (which I implemented only partially) was to make the labels case-insensitive based on `ignore_case`. I think the best way forward will be to copy merge-recursive.c:path_hash(). > > +struct string_entry { > > + struct oidmap_entry entry; > > + char string[FLEX_ARRAY]; > > +}; > > + > > +struct label_state { > > + struct oidmap commit2label; > > + struct hashmap labels; > > + struct strbuf buf; > > +}; > > + > > +static const char *label_oid(struct object_id *oid, const char *label, > > + struct label_state *state) > > +{ > > + struct labels_entry *labels_entry; > > + struct string_entry *string_entry; > > + struct object_id dummy; > > + size_t len; > > + int i; > > + > > + string_entry = oidmap_get(&state->commit2label, oid); > > + if (string_entry) > > + return string_entry->string; > > + > > + /* > > + * For "uninteresting" commits, i.e. commits that are not to be > > + * rebased, and which can therefore not be labeled, we use a unique > > + * abbreviation of the commit name. This is slightly more complicated > > + * than calling find_unique_abbrev() because we also need to make > > + * sure that the abbreviation does not conflict with any other > > + * label. > > + * > > + * We disallow "interesting" commits to be labeled by a string that > > + * is a valid full-length hash, to ensure that we always can find an > > + * abbreviation for any uninteresting commit's names that does not > > + * clash with any other label. > > + */ > > + if (!label) { > > + char *p; > > + > > + strbuf_reset(&state->buf); > > + strbuf_grow(&state->buf, GIT_SHA1_HEXSZ); > > + label = p = state->buf.buf; > > + > > + find_unique_abbrev_r(p, oid->hash, default_abbrev); > > + > > + /* > > + * We may need to extend the abbreviated hash so that there is > > + * no conflicting label. > > + */ > > + if (hashmap_get_from_hash(&state->labels, strihash(p), p)) { > > + size_t i = strlen(p) + 1; > > + > > + oid_to_hex_r(p, oid); > > + for (; i < GIT_SHA1_HEXSZ; i++) { > > + char save = p[i]; > > + p[i] = '\0'; > > + if (!hashmap_get_from_hash(&state->labels, > > + strihash(p), p)) > > + break; > > + p[i] = save; > > + } > > + } > > If oid->hash required full 40-hex to disambiguate, then > find-unique-abbrev would give 40-hex and we'd want the same "-<num>" > suffix technique employed below to make it consistently unique. I > wonder if organizing the function this way ... > > if (!label) > label = oid-to-hex(oid); > > if (label already exists or full oid) { > make it unambiguous; > } It was hard to miss, I agree. The first arm of the if() is not trying to make a label unambiguous by adding `-<num>`, but by extending the abbreviated 40-hex until it is unique. (And we guarantee that there is a solution, as we do not allow valid 40-hex strings to be used as label.) The reason: if no `label` is provided, we want a valid raw (possibly abbreviated) commit name. If `label` is provided, we want a valid ref name (possibly made unique by appending `-<num>`). Different beasts. Cannot be put into the same if() arm. > A related tangent. Does an auto-label given to "uninteresting" > commit need to be visible to end users? Yes. The reason for this is the `merge`/`reset` commands when *not* rebasing cousins. Because in that case, we have to refer to commits that we could not possibly have `label`ed, because they were never the current revision. Therefore we cannot assign <name>-<num> labels, but have to use the abbreviated commit hash. > > +static int make_script_with_merges(struct pretty_print_context *pp, > > + struct rev_info *revs, FILE *out, > > + unsigned flags) > > +{ > > + ... > > + int abbr = flags & TODO_LIST_ABBREVIATE_CMDS; > > + const char *p = abbr ? "p" : "pick", *l = abbr ? "l" : "label", > > + *t = abbr ? "t" : "reset", *b = abbr ? "b" : "bud", > > + *m = abbr ? "m" : "merge"; > > It would be easier to understand if these short named variables are > reserved only for temporary use, not as constants. It is not too > much to spell > > fprintf(out, "%s onto\n", cmd_label); > > than > > fprintf(out, "%s onto\n", l); > > and would save readers from head-scratching, wondering where the > last assignment to variable "l" is. Sure. > > + oidmap_init(&commit2todo, 0); > > + oidmap_init(&state.commit2label, 0); > > + hashmap_init(&state.labels, (hashmap_cmp_fn) labels_cmp, NULL, 0); > > + strbuf_init(&state.buf, 32); > > + > > + if (revs->cmdline.nr && (revs->cmdline.rev[0].flags & BOTTOM)) { > > + struct object_id *oid = &revs->cmdline.rev[0].item->oid; > > + FLEX_ALLOC_STR(entry, string, "onto"); > > + oidcpy(&entry->entry.oid, oid); > > + oidmap_put(&state.commit2label, entry); > > + } > > + > > + /* > > + * First phase: > > + * - get onelines for all commits > > + * - gather all branch tips (i.e. 2nd or later parents of merges) > > + * - label all branch tips > > + */ > > When an early part of a branch is merged and then the remaining part > of the same branch is merged again, "branch tip" and "2nd or later > parents of merges" would become different concepts. The 2nd parent > of an early merge is not among the branch tips. > > For the purpose of the "recreate the topology" algorithm, I am > imagining that you would need not just the tips but all the 2nd and > subsequent parents of merges, and my quick skimming tells me that > the following code grabs them correctly. Yes. I have used an earlier iteration of this patch series to create the merging-rebase of Git for Windows v2.16.0 and that has a couple of really good corner cases to exercise this code. Ciao, Dscho