I was recently reading the source code of git-range-diff. Its Steps: 1. Use git log <range> to generate two collections of patches for ranges a and b. 2. Use the hash table on both sets to find the commit with the exact same patch as an "exact match". 3. A cost matrix whose length and width are a.nr + b.br is generated. The cost of a completely matched item is 0, and the cost between two sets of patches is represented by the diffsize(a, b) of the patch. 4. Use the LAPJV algorithm to find the best match between multiple patches of the two sets. 5. Output results. My question is: 1. In step 3, why is the matrix size (a.nr + b.br) * (a.nr + b.br) instead of a.nr * b.nr? 2. Why the cost(x,y) which satisfies "x ∈ [a.nr, a.nr + b.nr) y ∈ [0, b.nr) || x ∈ [0, a.nr) y ∈ [b.nr, b. The cost of nr + a.nr)" is set to "diffsize(a or b) * creation_factor / 100 : COST_MAX"? What is the role of creation_factor? [1] 3. How does LAPJV work here? [2] [1]: https://github.com/git/git/blob/725f57037d81e24eacfda6e59a19c60c0b4c8062/range-diff.c#L310 [2]: https://github.com/git/git/blob/725f57037d81e24eacfda6e59a19c60c0b4c8062/linear-assignment.c#L15 Thanks. -- ZheNing Hu