Hi Stefan, On Thu, 3 May 2018, Stefan Beller wrote: > On Thu, May 3, 2018 at 8:30 AM, Johannes Schindelin > <johannes.schindelin@xxxxxx> wrote: > > This builtin does not do a whole lot so far, apart from showing a > > usage that is oddly similar to that of `git tbdiff`. And for a good > > reason: the next commits will turn `branch-diff` into a full-blown > > replacement for `tbdiff`. > > While I appreciate the 1:1 re-implementation, I'll comment as if this > was a newly invented tool, questioning design choices. They are probably > chosen pretty well, and fudge facotrs as below are at tweaked to a > reasonable factor, but I'll try to look with fresh eyes. Absolutely. While tbdiff got some testing over time, it has definitely not gotten as much exposure as branch-diff hopefully will. BTW I chose a different command name on purpose, so that we are free to change the design and not harm existing tbdiff users. > > At this point, we ignore tbdiff's color options, as they will all be > > implemented later and require some patches to the diff machinery. > > Speaking of colors, for origin/sb/blame-color Junio hinted at re-using > cyan for "uninteresting" parts to deliver a consistent color scheme for > Git. Eventually he dreams of having 2 layers of indirection IIUC, with > "uninteresting" -> cyan > "repeated lines in blame" -> uninteresting > > Maybe we can fit the coloring of this tool in this scheme, too? Sure. So you mean I should use cyan for... what part of the colored output? ;-) > > + double creation_weight = 0.6; > > I wondered if we use doubles in Gits code base at all, > and I found > > khash.h:59:static const double __ac_HASH_UPPER = 0.77; > pack-bitmap-write.c:248: static const double > REUSE_BITMAP_THRESHOLD = 0.2; > pack-bitmap.c:751: static const double REUSE_PERCENT = 0.9; > > all other occurrences of `git grep double` are mentioning it in other > contexts (e.g. "double linked list" or comments). > > When implementing diff heuristics in 433860f3d0b (diff: improve > positioning of add/delete blocks in diffs, 2016-09-05), Michael broke > it down to fixed integers instead of floating point. > > Do we need to dynamic of a floating point, or would a rather small range > suffice here? (Also see rename detection settings, that take percents as > integers) I guess you are right, and we do not need floats. It was just very, very convenient to do that instead of using integers because - I already had the Jonker-Volgenant implementation "lying around" from my previous life as an image processing expert, using doubles (but it was in Java, not in C, so I quickly converted it for branch-diff). - I was actually not paying attention whether divisions are a thing in the algorithm. From a cursory glance, it would appear that we are never dividing in hungarian.c, so theoretically integers should be fine. - using doubles neatly side-steps the overflow problem. If I use integers instead, I always will have to worry what to do if, say, adding `INT_MAX` to `INT_MAX`. I am particularly worried about that last thing: it could easily lead to incorrect results if we blindly, say, pretend that `INT_MAX + INT_MAX == INT_MAX` for the purpose of avoiding overflows. If, however, I misunderstood and you are only concerned about using *double-precision* floating point numbers, and would suggest using `float` typed variables instead, that would be totally cool with me. Ciao, Dscho