Re: [RFC PATCH 02/10] range-diff.c: don't use st_mult() for signed "int"

Jeff King <peff@xxxxxxxx> · Tue, 14 Dec 2021 09:42:15 -0500

On Fri, Dec 10, 2021 at 03:27:24PM +0100, Johannes Schindelin wrote:

> > I'm not sure if this is helpful or not, but this is the minimal fix I
> > came up with that runs the testcase I showed earlier. It's basically
> > just swapping out "int" for "ssize_t" for any variables we use to index
> > the arrays (though note a few are themselves held in arrays, and we have
> > to cross some function boundaries).
> >
> > I won't be surprised if it doesn't hit all cases, or if it even hits a
> > few it doesn't need to (e.g., should "phase" be dragged along with "i"
> > and "j" in the first hunk?). I mostly did guess-and-check on the
> > test-case, fixing whatever segfaulted and then running again until it
> > worked. I didn't even really read the code very carefully.
> >
> > I think you _did_ do more of that careful reading, and broke down the
> > refactorings into separate patches in your series. Which is good. So I
> > think what we'd want is to pick out those parts of your series that end
> > up switching the variable type. My goal in sharing this here is just to
> > show that the end result of the fix can (and IMHO should) be around this
> > same order of magnitude.
> 
> I am in favor of this patch. Will you have time to submit this with a
> commit message?

I'm not at all sure that it's sufficient. It avoids overflowing the
cost array accesses, and was tested on a square input of 2^15. But:

  - some of the other ints may need changing, too (e.g., column_count).
    Probably 2^31 commits is out of reach in practice (and probably
    other parts of Git run into problems there anyway). But some of
    those arguments may just be (a->nr * b->nr), whereas I was testing
    overflow at (a->nr + b->nr)^2.

  - I threw around ssize_t willy-nilly. Some of those could be size_t,
    and I think Ævar's patches go in the direction of splitting the two,
    which is good.

  - some light refactoring may be helpful to split those cases ahead of
    time.

So I was hoping Ævar would take this approach and run with it. I just
reviewed his follow-up series, though, and it looks like it is still
putting bounds-checks into the COST macro, which I think is not
sufficient.

-Peff