Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>
> The heuristic I proposed was
>
> * Prefer to start and end hunks *following* lines with the least
>   indentation.
>
> * Define the "indentation" of a blank line to be the indentation of
>   the previous non-blank line minus epsilon.
>
> * In the case of a tie, prefer to slide the hunk down as far as
>   possible.
>
> If that's what you are trying to implement, then notice that the first
> rule says that the hunk should start *following* the line with the least
> indentation. So the "score" for

I did not try to implement that heuristic.

> * Prefer to start and end hunks *following* lines with the least
>   indentation.

We had this alone as a heuristic and it turned out badly. (More
false positives than the current "empty line only" thing)

It may however be a good starting block when combined with the other
conditions, as blank lines are assigned a longer virtual length than
what they have.

> * In the case of a tie, prefer to slide the hunk down as far as
>   possible.
>

This is what the current implementation does as well. (i.e. if there is
more than one blank line, we have a tie, so choose the last one as the
end of the hunk)


So what I am proposing in this patch:

 * Prefer to start and end hunks *following* lines with an empty line.
   (as implemented currently)

 * In the case of a tie, make use of a second tier heuristic.

 * Define the "length" of a blank line to be the
    minimum of the amount of white space in the line before and after

 * Choose that empty line (first tier) that has the shortest length
(second tier)

 * In the case of a tie, prefer to slide the hunk down as far as
   possible.

I would be really interested how these two heuristics compare in practice.

>
>> +        end
>> +
>> +        def bal
>
> would be the indentation of the line preceding "end", which is larger
> than the indentation of "end". And the score for
>
>> +                do_bal_stuff()
>> +
>> +                common_ending()
>
> would come from the line preceding "do_bal_stuff()", namely "def bal()",
> which is indented the same as "end". So the latter would be preferred.
>
> But actually, I don't understand how your example is meaningful. I think
> this heuristic is only used to slide hunks around; i.e., when the line
> following the hunk is the same as the first lines of the hunk, or when
> the line preceding the hunk is the same as the last line of the hunk.

Right.

> Right? So your snippets would never compete against each other.

right, I was using that to show why some parts are more favorable to
put a break in (i.e. the transisiton from "+ lines" to surrounding lines.
I was not clear enough there. :(

> Let's
> take the full example. The five possible placements for the `+`
> characters are marked with the digits 1 through 5 here:
>
>>              def bar
>>                      do_bar_stuff()
>> 1
>> 12                   common_ending()
>> 123          end
>> 1234
>> 12345        def bal
>> 12345                do_bal_stuff()
>>  2345
>>   345                common_ending()
>>    45        end
>>     5
>>              def baz
>>                      do_baz_stuff()
>>
>>                      common_ending()
>>              end
>
> Of the lines preceding the candidate hunks, the blank line preceding the
> hunk marked "5" has the smallest indent, namely "epsilon" less than the
> indentation of the "end" line preceding it. So placement "5" would be
> chosen.

I agree 5 is the best here, but for other reasons.

Imagine /s/end/long final boilerplate code/, and it may be more clear.
What I am trying to say is, that you "indentation" of the blank line is not
reaching until the end of the "end" line, but rather to the front of "end".

(If you don't care of white spaces in the git history and have an editor, that
makes fancy white space stuff, i.e. your text editor cursor is in the "end" line
and you press "enter" to get to the next line, it is most likely indented to
strlen ("<end line>") - strlen("end"), as that is how much indentation we had
in the preceding line?

>
> I don't know enough about this area of the code to review your patch in
> detail, but I did note below a typo that jumped out at me.

I did not know the code either before writing the patch. :)


>> +                                     if (min_bl_neigh_indent > bl_neigh_indent)
>> +                                             min_bl_neigh_indent = min_bl_neigh_indent;
>
> The line above has no effect (same variable on both sides of the =).

Thanks!

I had the feeling something was off judging from behavior,
but could not tell what exactly. This is it.

It should be

    min_bl_neigh_indent = bl_neigh_indent;

i.e. the minimum of the existing and newly computed surrounding line
starting blanks.

Junio writes:
> Now, on what column does 'd' sit on the example below?
>
>        def foo
>
> I typed SP SP HT 'd' 'e' 'f'; it still is indented by 8 places.

I see, so for the TAB case we would need to do a

    ret += (8 - ret%8);

Thanks!
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]