On Thu, May 3, 2018 at 1:42 PM, Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: >> Speaking of colors, for origin/sb/blame-color Junio hinted at re-using >> cyan for "uninteresting" parts to deliver a consistent color scheme for >> Git. Eventually he dreams of having 2 layers of indirection IIUC, with >> "uninteresting" -> cyan >> "repeated lines in blame" -> uninteresting >> >> Maybe we can fit the coloring of this tool in this scheme, too? > > Sure. So you mean I should use cyan for... what part of the colored > output? ;-) > It is just a FYI heads up, not an actionable bikeshed painting plan. ;) >> Do we need to dynamic of a floating point, or would a rather small range >> suffice here? (Also see rename detection settings, that take percents as >> integers) > > I guess you are right, and we do not need floats. It was just very, very > convenient to do that instead of using integers because > > - I already had the Jonker-Volgenant implementation "lying around" from my > previous life as an image processing expert, using doubles (but it was > in Java, not in C, so I quickly converted it for branch-diff). > > - I was actually not paying attention whether divisions are a thing in the > algorithm. From a cursory glance, it would appear that we are never > dividing in hungarian.c, so theoretically integers should be fine. > > - using doubles neatly side-steps the overflow problem. If I use integers > instead, I always will have to worry what to do if, say, adding > `INT_MAX` to `INT_MAX`. > > I am particularly worried about that last thing: it could easily lead to > incorrect results if we blindly, say, pretend that `INT_MAX + INT_MAX == > INT_MAX` for the purpose of avoiding overflows. > > If, however, I misunderstood and you are only concerned about using > *double-precision* floating point numbers, and would suggest using `float` > typed variables instead, that would be totally cool with me. So by being worried about INT_MAX occurring, you are implying that we have to worry about a large range of values, so maybe floating points are the best choice here. Looking through that algorithm the costs seem to be integers only measuring number of lines, so I would not be too worried about running into INT_MAX problems except for the costs that are assigned INT_MAX explicitly. I was more asking, if floating point is the right tool for the job. Stefan