Re: [PATCH 02/18] Add a new builtin: branch-diff

Stefan Beller <sbeller@xxxxxxxxxx> · Thu, 3 May 2018 14:12:55 -0700

On Thu, May 3, 2018 at 1:42 PM, Johannes Schindelin
<Johannes.Schindelin@xxxxxx> wrote:

>> Speaking of colors, for origin/sb/blame-color Junio hinted at re-using
>> cyan for "uninteresting" parts to deliver a consistent color scheme for
>> Git. Eventually he dreams of having 2 layers of indirection IIUC, with
>>     "uninteresting" -> cyan
>>     "repeated lines in blame" -> uninteresting
>>
>> Maybe we can fit the coloring of this tool in this scheme, too?
>
> Sure. So you mean I should use cyan for... what part of the colored
> output? ;-)
>

It is just a FYI heads up, not an actionable bikeshed painting plan. ;)

>> Do we need to dynamic of a floating point, or would a rather small range
>> suffice here? (Also see rename detection settings, that take percents as
>> integers)
>
> I guess you are right, and we do not need floats. It was just very, very
> convenient to do that instead of using integers because
>
> - I already had the Jonker-Volgenant implementation "lying around" from my
>   previous life as an image processing expert, using doubles (but it was
>   in Java, not in C, so I quickly converted it for branch-diff).
>
> - I was actually not paying attention whether divisions are a thing in the
>   algorithm. From a cursory glance, it would appear that we are never
>   dividing in hungarian.c, so theoretically integers should be fine.
>
> - using doubles neatly side-steps the overflow problem. If I use integers
>   instead, I always will have to worry what to do if, say, adding
>   `INT_MAX` to `INT_MAX`.
>
> I am particularly worried about that last thing: it could easily lead to
> incorrect results if we blindly, say, pretend that `INT_MAX + INT_MAX ==
> INT_MAX` for the purpose of avoiding overflows.
>
> If, however, I misunderstood and you are only concerned about using
> *double-precision* floating point numbers, and would suggest using `float`
> typed variables instead, that would be totally cool with me.

So by being worried about INT_MAX occurring, you are implying that
we have to worry about a large range of values, so maybe floating points
are the best choice here.

Looking through that algorithm the costs seem to be integers only
measuring number of lines, so I would not be too worried about running
into INT_MAX problems except for the costs that are assigned INT_MAX
explicitly.

I was more asking, if floating point is the right tool for the job.

Stefan