Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/23/2016 07:37 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
>> Scoring heuristic:
>>
>>>     # A place to accumulate bonus factors (positive makes this
>>>     # index more favored):
>>>     bonus = 0
>>>
>>>     # Bonuses based on the location of blank lines:
>>>     if pre_blank and not blank:
>>>         bonus += 3
>>>     elif blank and not pre_blank:
>>>         bonus += 2
>>>     elif blank and pre_blank:
>>>         bonus += 1
>>>
>>>     # Now fill in missing indent values:
>>>     if pre_indent is None:
>>>         pre_indent = 0
>>>
>>>     if post_indent is None:
>>>         post_indent = 0
> 
> These "missing" are to handle the lines at the boundary of shiftable
> block, or do you have pre_indent if a line is the earliest possible
> one that could be included in the hunk by shifting, as long as it is
> not at the beginning of the file?

The missing values are meant to handle the case where there are only
blank lines between the add/delete block and the beginning/end of the
file. Though currently (due to a limitation of the prototype) I only
pass context up to the next add/delete line, so that fallback also kicks
in if there are only blank lines between this add/delete block and the
next added/deleted line. When that is fixed, I think it might help to
give beginning/end of file a little bit more bonus.

This isn't all implemented yet, but I think the best score for an
add/delete block positioning would be the sum of three values. Let's
take the following "complete file" diff as an example, and consider an
add block with the following positioning:

> diff --git a/file.txt b/file.txt
> index 9405325..6a6b95c 100644
> --- a/file.txt
> +++ b/file.txt
> @@ -1,5 +1,7 @@
>  a
>  b
>  c
> +x
> +c
>  d
>  e

The score for splitting the old version of the file between
"bc" and "de", plus

the score for splitting the new version of the file between
"bc" and "xc", plus

the score for splitting the new version of the file between
"xc" and "de".

> I think "this is indented deeper than the previous, and it wants to
> stay with the previous if possible" is an excellent heuristic, which
> you have below.  I wonder if "however, the previous cannot be
> included in the hunk by shifting, hence we try to keep this together
> with it by favouring its exclusion from the hunk" could be part of
> the same heuristics, if you do have pre_indent value for a line at
> the boundary of shiftable block.

That's automatic, isn't it? If splitting between two lines is assigned a
high cost, then the algorithm will avoid splitting there *either* by
including both lines in the hunk *or* by omitting both lines.

> Also I am wondering if 0 is a sensible thing to use as the fallback
> value.  When pre_indent is not definable, turning it to 0 and
> declaring that our indented line is indented deeper than the
> previous line and penalizing with "bonus -= 4" (which you do) feels
> just as fishy as turning the undefined pre_indent to maxint and
> declaring that our indented line is indented shallower than the
> previous line (which you don't, but is just as justifiable, I would
> think).

Once the algorithm is fixed to consider the whole file, then the
fallback value will only be used at the beginning and end of the file,
where an indent of 0 seems like the natural thing to do. As I mentioned
above, it might even help to give beginning/end of file a little more
bonus than that.

But really, testing against real-world diffs will be the best basis for
adjusting the heuristic.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]