Re: [PATCH 02/18] Add a new builtin: branch-diff

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Thu, 3 May 2018 22:42:25 +0200 (DST)

Hi Stefan,

On Thu, 3 May 2018, Stefan Beller wrote:

> On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin
> <johannes.schindelin@xxxxxx> wrote:
> > This builtin does not do a whole lot so far, apart from showing a
> > usage that is oddly similar to that of `git tbdiff`. And for a good
> > reason: the next commits will turn `branch-diff` into a full-blown
> > replacement for `tbdiff`.
> 
> While I appreciate the 1:1 re-implementation, I'll comment as if this
> was a newly invented tool, questioning design choices. They are probably
> chosen pretty well, and fudge facotrs as below are at tweaked to a
> reasonable factor, but I'll try to look with fresh eyes.

Absolutely. While tbdiff got some testing over time, it has definitely not
gotten as much exposure as branch-diff hopefully will.

BTW I chose a different command name on purpose, so that we are free to
change the design and not harm existing tbdiff users.

> > At this point, we ignore tbdiff's color options, as they will all be
> > implemented later and require some patches to the diff machinery.
> 
> Speaking of colors, for origin/sb/blame-color Junio hinted at re-using
> cyan for "uninteresting" parts to deliver a consistent color scheme for
> Git. Eventually he dreams of having 2 layers of indirection IIUC, with
>     "uninteresting" -> cyan
>     "repeated lines in blame" -> uninteresting
> 
> Maybe we can fit the coloring of this tool in this scheme, too?

Sure. So you mean I should use cyan for... what part of the colored
output? ;-)

> > +       double creation_weight = 0.6;
> 
> I wondered if we use doubles in Gits code base at all,
> and I found
> 
> khash.h:59:static const double __ac_HASH_UPPER = 0.77;
> pack-bitmap-write.c:248:        static const double
> REUSE_BITMAP_THRESHOLD = 0.2;
> pack-bitmap.c:751:      static const double REUSE_PERCENT = 0.9;
> 
> all other occurrences of `git grep double` are mentioning it in other
> contexts (e.g. "double linked list" or comments).
> 
> When implementing diff heuristics in 433860f3d0b (diff: improve
> positioning of add/delete blocks in diffs, 2016-09-05), Michael broke
> it down to fixed integers instead of floating point.
> 
> Do we need to dynamic of a floating point, or would a rather small range
> suffice here? (Also see rename detection settings, that take percents as
> integers)

I guess you are right, and we do not need floats. It was just very, very
convenient to do that instead of using integers because

- I already had the Jonker-Volgenant implementation "lying around" from my
  previous life as an image processing expert, using doubles (but it was
  in Java, not in C, so I quickly converted it for branch-diff).

- I was actually not paying attention whether divisions are a thing in the
  algorithm. From a cursory glance, it would appear that we are never
  dividing in hungarian.c, so theoretically integers should be fine.

- using doubles neatly side-steps the overflow problem. If I use integers
  instead, I always will have to worry what to do if, say, adding
  `INT_MAX` to `INT_MAX`.

I am particularly worried about that last thing: it could easily lead to
incorrect results if we blindly, say, pretend that `INT_MAX + INT_MAX ==
INT_MAX` for the purpose of avoiding overflows.

If, however, I misunderstood and you are only concerned about using
*double-precision* floating point numbers, and would suggest using `float`
typed variables instead, that would be totally cool with me.

Ciao,
Dscho