Re: [PATCH v2 01/18] Add a function to solve least-cost assignment problems

Jeff King <peff@xxxxxxxx> · Sat, 5 May 2018 14:24:27 -0400

On Fri, May 04, 2018 at 05:34:29PM +0200, Johannes Schindelin wrote:

> The Jonker-Volgenant algorithm was implemented to answer questions such
> as: given two different versions of a topic branch (or iterations of a
> patch series), what is the best pairing of commits/patches between the
> different versions?

I love git-tbdiff, so I'm excited to see it getting more exposure (and a
speedup). Thanks for working on this!

Two minor nits on this patch:

> +/*
> + * The parameter `cost` is the cost matrix: the cost to assign column j to row
> + * i is `cost[j + column_count * i].
> + */
> +int compute_assignment(int column_count, int row_count, double *cost,
> +		       int *column2row, int *row2column)
> +{
> +	double *v = xmalloc(sizeof(double) * column_count), *d;

Please use st_mult, xcalloc, or ALLOC_ARRAY here to avoid unchecked
multiplication in an allocation. This is probably hard to exploit in
practice (since you'd need sizeof(size_t)/8 columns, which presumably
requires allocating some heavier-weight struct per item). But it makes
auditing easier if we avoid the pattern altogether.

> +/*
> + * Compute an assignment of columns -> rows (and vice versa) such that every
> + * column is assigned to at most one row (and vice versa) minimizing the
> + * overall cost.
> + *
> + * The parameter `cost` is the cost matrix: the cost to assign column j to row
> + * i is `cost[j + column_count * i].
> + *
> + * The arrays column2row and row2column will be populated with the respective
> + * assignments (-1 for unassigned, which can happen only if column_count !=
> + * row_count).
> + */
> +int compute_assignment(int column_count, int row_count, double *cost,
> +		       int *column2row, int *row2column);

It looks like this always returns zero. Is there a ever a case where we
would return an error if this? If not, should it just be void?

-Peff