Re: [RFC PATCH 00/10] range-diff: fix segfault due to integer overflow

Jeff King <peff@xxxxxxxx> · Wed, 22 Dec 2021 16:11:51 -0500

On Wed, Dec 22, 2021 at 09:50:50PM +0100, Johannes Schindelin wrote:

> To raise the limits you would have to understand the purpose of the
> calculations so that you can understand the range their data type needs to
> handle. The weights of the Hungarian Algorithm are distinctly _not_
> pointers, therefore using `size_t` is most likely the wrong thing to do.
> 
> Of course you can glance over the details and try to avoid digging into
> the algorithm to understand what it does before changing the data types
> and introducing `st_mult()`-like functions and macros, but that only makes
> it "relatively easy", at the price of "maybe incorrect". That would be in
> line with what I unfortunately have had to come to expect of your patches.

:( Whether you're frustrated with this topic or not, can we please stick
to technical critiques? Either proposed changes are the right thing or
not, and we can talk about that.

I know it can get hard if the review is "gee, it looks like this is
going in the wrong direction, but I don't have time to dig in and tell
you _exactly_ how it is wrong right now". And I think that is an OK
thing to express, but your response here has a bit more bite to it than
I think is strictly necessary.

> The _actual_ "relatively easy" way is to imitate the limits we use in
> xdiff (for similar reasons). As I said before.

I had a similar thought at first, too. But because one of the array
sizes we compute is (nr_a + nr_b)^2, I fear the limits end up pretty low
(e.g., my 32767 by 32767 example). That's a pretty big range diff, but
it doesn't seem like an outrageous input to throw at the system
(especially if most of the entries end up finding a match and showing
one line of equality).

It may be that there are multiple limits to consider, though. E.g., the
square of the sum of the sides, as above, is one. But there may be some
benefit to making that work (by using size_t and st_add) but putting a
hard limit like 2^30 on the number of commits on one side (e.g., for
ranking scores). And that's where understanding exactly what the
algorithm is doing becomes necessary.

-Peff