Question: How range-diff lapjv algorithm work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was recently reading the source code of git-range-diff.

Its Steps:
1. Use git log <range> to generate two collections of patches for
ranges a and b.
2. Use the hash table on both sets to find the commit with the exact
same patch as an "exact match".
3. A cost matrix whose length and width are a.nr + b.br is generated.
The cost of a completely matched item is 0, and the cost between two
sets of patches is represented by the diffsize(a, b) of the patch.
4. Use the LAPJV algorithm to find the best match between multiple
patches of the two sets.
5. Output results.

My question is:
1. In step 3, why is the matrix size (a.nr + b.br) * (a.nr + b.br)
instead of a.nr * b.nr?

2. Why the cost(x,y) which satisfies "x ∈ [a.nr, a.nr + b.nr) y ∈ [0,
b.nr) || x ∈ [0, a.nr) y ∈ [b.nr, b. The cost of nr + a.nr)" is set to
"diffsize(a or b) * creation_factor / 100 : COST_MAX"? What is the
role of creation_factor? [1]

3. How does LAPJV work here? [2]

[1]: https://github.com/git/git/blob/725f57037d81e24eacfda6e59a19c60c0b4c8062/range-diff.c#L310
[2]: https://github.com/git/git/blob/725f57037d81e24eacfda6e59a19c60c0b4c8062/linear-assignment.c#L15

Thanks.
--
ZheNing Hu




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux