On Tue, Jun 6, 2017 at 2:50 AM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote: > On Mon, Jun 5, 2017 at 8:23 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote: >> >> > [...] >> > "git diff" has been taught to optionally paint new lines that are >> > the same as deleted lines elsewhere differently from genuinely new >> > lines. >> > >> > Are we happy with these changes? > > > I've been studiously ignoring this patch series due to lack of bandwidth. > >> [...] >> Things to come, but not in this series as they are more advanced: >> >> Discuss if a block/line needs a minimum requirement. >> >> When doing reviews with this series, a couple of lines such >> as "\t\t}" were marked as a moved, which is not wrong as they >> really occurred in the text with opposing sign. >> But it was annoying as it drew my attention to just closing >> braces, which IMO is not the point of code review. >> >> To solve this issue I had the idea of a "minimum requirement", e.g. >> * at least 3 consecutive lines or >> * at least one line with at least 3 non-ws characters or >> * compute the entropy of a given moved block and if it is too low, do >> not mark it up. > > Shooting from the hip here... > > It seems obvious that for a line to be marked as moved, a minimum > requirement is that > > 1. The line appears as both "+" and "-". > > That doesn't seem strong enough evidence though, and if that is the > only criterion, I would expect a lot of boilerplate lines like "\t\t}" > to be marked as moved. It seems like a lot of noise could be > eliminated by *also* requiring that > > 2a. The line doesn't appear elsewhere in the file(s) concerned. > > Rule (2a) would probably get rid of most boilerplate lines without > having to try to measure entropy. > > Maybe you are already using both criteria? I didn't see it in a quick > perusal of the code. > > OTOH, it would be silly to refuse to mark lines like "\t\t}" as moved > *only* because they appear elsewhere in the file(s). If you did so, > you would have gaps of supposedly non-moved lines in the middle of > moved blocks. This suggests marking as moved lines matching (1) and > (2a) but also lines matching (1) and the following: > > 2b. The line is adjacent to to another line that is thought to have > moved from the same old location to the same new location. > > Rule (2b) would be applied recursively, with the net effect being that > any line satisfying (1) and (2a) is allowed to carry along any > neighboring lines within the same "+"/"-" block even if they are not > unique. > > Michael This sounds reasonable to me, though I'm not sure how easy it is to implement. Thanks, Jake