Re: What's cooking in git.git (Jun 2017, #03; Mon, 5)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 6, 2017 at 3:05 PM, Jacob Keller <jacob.keller@xxxxxxxxx> wrote:
> On Tue, Jun 6, 2017 at 2:50 AM, Michael Haggerty <mhagger@xxxxxxxxxxxx> wrote:
>> On Mon, Jun 5, 2017 at 8:23 PM, Stefan Beller <sbeller@xxxxxxxxxx> wrote:
>>>
>>> > [...]
>>> >  "git diff" has been taught to optionally paint new lines that are
>>> >  the same as deleted lines elsewhere differently from genuinely new
>>> >  lines.
>>> >
>>> >  Are we happy with these changes?
>>
>>
>> I've been studiously ignoring this patch series due to lack of bandwidth.
>>
>>> [...]
>>> Things to come, but not in this series as they are more advanced:
>>>
>>>     Discuss if a block/line needs a minimum requirement.
>>>
>>> When doing reviews with this series, a couple of lines such
>>> as "\t\t}" were marked as a moved, which is not wrong as they
>>> really occurred in the text with opposing sign.
>>> But it was annoying as it drew my attention to just closing
>>> braces, which IMO is not the point of code review.
>>>
>>> To solve this issue I had the idea of a "minimum requirement", e.g.
>>> * at least 3 consecutive lines or
>>> * at least one line with at least 3 non-ws characters or
>>> * compute the entropy of a given moved block and if it is too low, do
>>>   not mark it up.
>>
>> Shooting from the hip here...
>>
>> It seems obvious that for a line to be marked as moved, a minimum
>> requirement is that
>>
>> 1. The line appears as both "+" and "-".
>>
>> That doesn't seem strong enough evidence though, and if that is the
>> only criterion, I would expect a lot of boilerplate lines like "\t\t}"
>> to be marked as moved. It seems like a lot of noise could be
>> eliminated by *also* requiring that
>>
>> 2a. The line doesn't appear elsewhere in the file(s) concerned.

'elsewhere' in the opposing sign (+,-) or all the diff (including ' ' context)?

This rule opens up the discussion on multi-copies, which I imagine
happens a lot in configuration files. So say you have a prod and staging
environment, then you might be tempted to make patches titled as:
  "1. preparation: duplicate common code into prod and staging"
  "2. Make an actual change to staging"

For 1. you still want to see that there is faithful copy, but we'd have
2 postimages having these lines.

Also what about de-duplication?
I just stumbled upon edb0c72428 ([PATCH] diff: consolidate test
helper script pieces., 2005-05-31) for unrelated reasons,
but the move coloring of the same content multiple times
helped me there to focus on the relevant part.

>>
>> Rule (2a) would probably get rid of most boilerplate lines without
>> having to try to measure entropy.

But it would also get rid of good use cases when not being very careful.
I intentionally left out the (2a) as I am not yet sure how the move
detection for multiple occurrences in post and preimage should
work in the desired case. The suppression of little-entropy closing braces
might be a side effect of just this. Or it can be treated separately.

>>
>> Maybe you are already using both criteria? I didn't see it in a quick
>> perusal of the code.
>>
>> OTOH, it would be silly to refuse to mark lines like "\t\t}" as moved
>> *only* because they appear elsewhere in the file(s). If you did so,
>> you would have gaps of supposedly non-moved lines in the middle of
>> moved blocks. This suggests marking as moved lines matching (1) and
>> (2a) but also lines matching (1) and the following:
>>
>> 2b. The line is adjacent to to another line that is thought to have
>> moved from the same old location to the same new location.

This is what we do, a "block detection" by comparing "line runs" against
the current lines. Based on these line runs we detect one block and
color up adjacent blocks.

>>
>> Rule (2b) would be applied recursively, with the net effect being that
>> any line satisfying (1) and (2a) is allowed to carry along any
>> neighboring lines within the same "+"/"-" block even if they are not
>> unique.

So you are saying each block has to have at least one unique line?
That doesn't go well with (de-)duplication IMHO.

Thanks for your shot from the hip. I'll think about these rules more to see
if I can make sense of them for duplication still.

Thanks,
Stefan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]