Hi Eric, On Mon, 16 Jul 2018, Eric Sunshine wrote: > On Tue, Jul 3, 2018 at 7:27 AM Johannes Schindelin via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: > > At this stage, `git range-diff` can determine corresponding commits > > of two related commit ranges. This makes use of the recently introduced > > implementation of the Hungarian algorithm. > > Did you want s/Hungarian/Jonker-Volgenant/ here? (Not worth a re-roll.) It is worth a new iteration, and I'd rather say "linear assignment" than either Hungarian or Jonker-Volgenant. Thanks for pointing this out. > > The core of this patch is a straight port of the ideas of tbdiff, the > > apparently dormant project at https://github.com/trast/tbdiff. > > [...] > > Signed-off-by: Johannes Schindelin <johannes.schindelin@xxxxxx> > > --- > > diff --git a/builtin/range-diff.c b/builtin/range-diff.c > > @@ -17,9 +18,49 @@ int cmd_range_diff(int argc, const char **argv, const char *prefix) > > + int res = 0; > > + struct strbuf range1 = STRBUF_INIT, range2 = STRBUF_INIT; > > > > - argc = parse_options(argc, argv, NULL, options, > > - builtin_range_diff_usage, 0); > > + argc = parse_options(argc, argv, NULL, options, builtin_range_diff_usage, > > + 0); > > This parse_options() change appears to be merely a re-wrapping of the > line between patches 2 and 3. True, and it is a bad change because it makes the line longer than 80 columns. Fixed. > > - return 0; > > + if (argc == 2) { > > + if (!strstr(argv[0], "..")) > > + warning(_("no .. in range: '%s'"), argv[0]); > > + strbuf_addstr(&range1, argv[0]); > > + > > + if (!strstr(argv[1], "..")) > > + warning(_("no .. in range: '%s'"), argv[1]); > > + strbuf_addstr(&range2, argv[1]); > > Should these die() (like the "..." case below) rather than warning()? > Warning and continuing doesn't seem like intended behavior. When I > test this with on git.git and omit the "..", git sits for a long, long > time consuming the CPU. I guess it's git-log'ing pretty much the > entire history. I had to go back to `git-tbdiff.py` to see how it handles this, and you are right: it should die(). Fixed. (Technically, it is conceivable that some user wants to compare two independent commit histories, e.g. when a repository was imported from a different SCM two times, independently. I guess when that happens, we can always implement a `range-diff --root <tip1> <tip2>` or some such.) > % GIT_TRACE=1 git range-diff v1 v2 > warning: no .. in range: 'v1' > warning: no .. in range: 'v2' > trace: git log --no-color -p --no-merges --reverse \ > --date-order --decorate=no --no-abbrev-commit v1 > ^C > % > > > + } else if (argc == 3) { > > + strbuf_addf(&range1, "%s..%s", argv[0], argv[1]); > > + strbuf_addf(&range2, "%s..%s", argv[0], argv[2]); > > + } else if (argc == 1) { > > + const char *b = strstr(argv[0], "..."), *a = argv[0]; > > + int a_len; > > + > > + if (!b) > > + die(_("single arg format requires a symmetric range")); > > diff --git a/range-diff.c b/range-diff.c > > @@ -0,0 +1,307 @@ > > +static int read_patches(const char *range, struct string_list *list) > > +{ > > + while (strbuf_getline(&line, in) != EOF) { > > + if (skip_prefix(line.buf, "commit ", &p)) { > > + [...] > > + in_header = 1; > > + continue; > > + } > > + if (starts_with(line.buf, "diff --git")) { > > + in_header = 0; > > + [...] > > + } else if (in_header) { > > + if (starts_with(line.buf, "Author: ")) { > > + [...] > > + } else if (starts_with(line.buf, " ")) { > > + [...] > > + } > > + continue; > > + } else if (starts_with(line.buf, "@@ ")) > > + strbuf_addstr(&buf, "@@"); > > + else if (line.buf[0] && !starts_with(line.buf, "index ")) > > + /* > > + * A completely blank (not ' \n', which is context) > > + * line is not valid in a diff. We skip it > > + * silently, because this neatly handles the blank > > + * separator line between commits in git-log > > + * output. > > + */ > > + strbuf_addbuf(&buf, &line); > > This comment had me confused for a bit since it doesn't seem to agree > with the 'then' part of the 'if', but rather applies more to the > 'else'. Had it been split into two parts (one for 'then' and one for > 'else'), it might have been easier to digest. That is, something like: > > else if (line.buf[0] && !starts_with(..., "index ")) > /* A line we wish to keep. */ > strbuf_addbuf(...); > else > /* > * A completely blank line between commits or > * or one in which we are otherwise not interested. > */ > continue; > > or something. Structuring it a bit differently might have helped, as well: > > else if (!line.buf[0]) > /* A completely blank line between commits. */ > continue; > else if (starts_with(..., "index ")) > /* A line in which we are not interested. */ > continue; > else > strbuf_addbuf(&buf, &line); I like this much better, too. > Not at all worth a re-roll. I'll have to send a new iteration anyway, after digging into ws.c, I think. Also: I had this idea that dimming the "old" diff would make a ton of sense, so I want to try that. > > + else > > + continue; > > + if (util) > > + string_list_append(list, buf.buf)->util = util; > > So, the parser is grabbing each commit and shoving all the > "interesting" information about the commit in a 'patch_util'. It grabs > the OID, author, the commit message (indented), the "diff --git", > "+++", "---" lines (but ignores "index" line), "@@" lines (but > ignoring the gunk after "@@"), and all context and patch lines. > > Looks good. Correct. > > + strbuf_release(&buf); > > + > > + if (finish_command(&cp)) > > + return -1; > > + > > + return 0; > > +} Thank you for your suggestions! Dscho