Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 16, 2016 at 1:27 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> ...
>> because there is less space between line start and {end, def bal}
>> than for {do_bal_stuff, common_ending}.
>
> I haven't thought this carefully yet, but would this equally work
> well for Python, where it does not have the "end" or does the lack
> of "end" pose a problem?  You'll still find "def bal" is a good
> boundary (but you cannot tell if it is the beginning or the end of a
> block, unless you understand the language), though.

Good point.  I found a flaw in my implementation
(as it doesn't match my mental model, not necessarily a bad thing)

We take the minimum of the two neighbors, i.e.

+                do_bal_stuff()
+
+        common_ending()

is preferrable to

+                do_bal_stuff()
+
+                common_ending()

and in python the example would look like:

    def foo():
        do_foo()

        common_thing()

+    def baz():
+        do_baz()
+
+        common_thing()
+
    def bar():
        do_bar()

        common_thing()

and breaking between

        common_thing()

    def bar():

is more favorable than between

        do_baz()

        common_thing()

because in the first former the count of
white space in front of "def bar():" is smaller
than for any of "do_baz()" and "common_thing()"


>
>> +static unsigned int leading_blank(const char *line)
>> +{
>> +     unsigned int ret = 0;
>> +     while (*line) {
>> +             if (*line == '\t')
>> +                     ret += 8;
>
> This will be broken with a line with space-before-tab whitespace
> breakage, I suspect...

How so? We inspect each character on its own and then move on later
by line++. (I am not seeing how this could cause trouble, so please
help me?)

Going back to python, this may become a problem when you have a code like:

 def baz():

        do_baz()

        common_thing()

 def bar():

+       do_bal()
+
+       common_thing()
+
+def bar():
+
        do_bar()

        common_thing()


but this was fabricated with a typo (the first definition of bar
should have been bal),
(Also it doesn't worsen the diff, as it is same without the heuristic)

once that typo is fixed we get:
(both with and without the heuristic)

        do_foo()

        common_thing()

 def baz():
        do_baz()

        common_thing()

+def bal():
+
+       do_bal()
+
+       common_thing()
+
 def bar():

        do_bar()

        common_thing()

Clearly it can also be intentional to have 2 methods with the same
code for historical reasons, (even without the blank line after the
function definition this produces the same result)

When playing around with various diffs I could not find a thing that
this patch makes worse, it only fixes the actual issue.
(I realized Peff actually attached a script to produce a bad diff, which
is gone with this patch)

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]