Re: [PATCH] diff compaction heuristic: favor shortest neighboring blank lines

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:

> I put quite of work into tooling to evaluate diff heuristics, and just
> published it on GitHub:
>
>     https://github.com/mhagger/diff-slider-tools
>
> The README there is hopefully enough to let people get started using it
> to test their own favorite heuristics.

Intereting.  A fair TL;DR of this would be that what we write and
want to diff have the structure where things often are ordered in an
inherent hierarchy, and things at the higher level are highlighted
by being less indented, and the indent-based heuristics exploit that
well to choose a slide that breaks at the higher-level boundary, e.g.

> The algorithm does pretty well with XML and HTML:
>
>> fef3ea39f8bd474add292bb6437df6cbd22e1ba7:contributors.xml a394a0bdf8e6240dc09023a8260059c57c158a00:contributors.xml + 1624
>> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>>                >    <last>Toni</last>
>>                >  </name>
>>                >  <name>
>>                >    <first>Vincent</first>
>>                >    <last>Legoll</last>
>>       -2 |     >  </name>
>>       -1 |   i >  <name>
>>        0 || ci >    <first>Vincent</first>
>>          || ci >    <last>Privat</last>
>>           | ci >  </name>
>>           | c  >  <name>

And that trait is shared among common programming languages (Java,
Perl, Go, C, etc.).

The only case I can think of offhand that does not follow "higher
level heading is less indented" pattern is an already typeset book,
where chapter headers are often centered on the page while section
headers may be at the left edge.  But we are not likely to be
interested to get that case right anyway, so it is OK.

> gettext source:
>
>> aef18cc6063106075afeb3a78ec72656b19c8bde:po/de.po b0e098ce46ce7f8ecd975720ccec8d0eb7787e50:po/de.po - 12954
>> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>>                >#~ msgid "unmerged:   %s"
>>                >#~ msgstr "nicht zusammengeführt:   %s"
>>                >
>>                >#~ msgid "input paths are terminated by a null character"
>>                >#~ msgstr "Eingabepfade sind durch ein NUL Zeichen abgeschlossen"
>>       -2 |     >
>>       -1 |  ci >#~ msgid ""
>>        0 || ci >#~ "The following untracked files would NOT be saved but need to be removed "
>>          || ci >#~ "by stash save:"
>>          || ci >#~ msgstr ""
>>          || ci >#~ "Die folgenden unbeobachteten Dateien würden NICHT gespeichert werden,\n"
>>          || ci >#~ "müssen aber durch \"stash save\" entfernt werden:"
>>           | ci >

This is an interesting example of "have only a single level
hierarchy.  A line is either in one block or a blank between
blocks".

> It often fails to get C preprocessor directives right:
>
>> a08595f76159b09d57553e37a5123f1091bb13e7:http.c aeff8a61216bf6e0d663c08c583bc8552fa3c344:http.c + 429
>> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>>                >		curl_easy_setopt(result, CURLOPT_SSLKEY, ssl_key);
>>                >#endif
>>                >#if LIBCURL_VERSION_NUM >= 0x070908
>>                >	if (ssl_capath != NULL)
>>                >		curl_easy_setopt(result, CURLOPT_CAPATH, ssl_capath);
>>       -1 |   i >#endif
>>        0 || ci >#if LIBCURL_VERSION_NUM >= 0x072c00
>>          || ci >	if (ssl_pinnedkey != NULL)
>>          || ci >		curl_easy_setopt(result, CURLOPT_PINNEDPUBLICKEY, ssl_pinnedkey);
>>           | c  >#endif

Yes, this is "non-human do not know 'end' is likely to be at the end
of a logical block".

> And it gets confused by unusual blank line placement:
>
>> ed55169834a3ce16a271def9630c858626ded34d:tools/eslint/node_modules/doctrine/lib/doctrine.js 2d441493a4a46a511ba1bdf93e442c3288fbe92d:tools/eslint/node_modules/doctrine/lib/doctrine.js + 330
>> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
>>                >                        name === 'external' ||
>>                >                        name === 'event')) {
>>                >                    name += advance();
>>                >                    name += scanIdentifier(last);
>>                >
>>       -1 |   i >                }
>>        0 || ci >                if(source.charCodeAt(index) === 0x5B  /* '[' */ && source.charCodeAt(index + 1) === 0x5D  /* ']' */){
>>          || ci >                    name += advance();
>>          || ci >                    name += advance();
>>           | c  >                }

Likewise, this is showing that a "non-human not knowing } is a closing
and { is an opening token".

By the way, I no longer remember what indent level your
"indent-only" heuristics gives to a blank line.  Does the closing
brace we see above (marked with -1) count as increasing the
indentation level from the previous empty-line at indent 0, or does it
count as dedenting from the previous empty-line that has virtually
at the same indent level as "name += scanIdentifier(last);" line?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]