Michael Haggerty <mhagger@xxxxxxxxxxxx> writes: > I put quite of work into tooling to evaluate diff heuristics, and just > published it on GitHub: > > https://github.com/mhagger/diff-slider-tools > > The README there is hopefully enough to let people get started using it > to test their own favorite heuristics. Intereting. A fair TL;DR of this would be that what we write and want to diff have the structure where things often are ordered in an inherent hierarchy, and things at the higher level are highlighted by being less indented, and the indent-based heuristics exploit that well to choose a slide that breaks at the higher-level boundary, e.g. > The algorithm does pretty well with XML and HTML: > >> fef3ea39f8bd474add292bb6437df6cbd22e1ba7:contributors.xml a394a0bdf8e6240dc09023a8260059c57c158a00:contributors.xml + 1624 >> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv >> > <last>Toni</last> >> > </name> >> > <name> >> > <first>Vincent</first> >> > <last>Legoll</last> >> -2 | > </name> >> -1 | i > <name> >> 0 || ci > <first>Vincent</first> >> || ci > <last>Privat</last> >> | ci > </name> >> | c > <name> And that trait is shared among common programming languages (Java, Perl, Go, C, etc.). The only case I can think of offhand that does not follow "higher level heading is less indented" pattern is an already typeset book, where chapter headers are often centered on the page while section headers may be at the left edge. But we are not likely to be interested to get that case right anyway, so it is OK. > gettext source: > >> aef18cc6063106075afeb3a78ec72656b19c8bde:po/de.po b0e098ce46ce7f8ecd975720ccec8d0eb7787e50:po/de.po - 12954 >> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv >> >#~ msgid "unmerged: %s" >> >#~ msgstr "nicht zusammengeführt: %s" >> > >> >#~ msgid "input paths are terminated by a null character" >> >#~ msgstr "Eingabepfade sind durch ein NUL Zeichen abgeschlossen" >> -2 | > >> -1 | ci >#~ msgid "" >> 0 || ci >#~ "The following untracked files would NOT be saved but need to be removed " >> || ci >#~ "by stash save:" >> || ci >#~ msgstr "" >> || ci >#~ "Die folgenden unbeobachteten Dateien würden NICHT gespeichert werden,\n" >> || ci >#~ "müssen aber durch \"stash save\" entfernt werden:" >> | ci > This is an interesting example of "have only a single level hierarchy. A line is either in one block or a blank between blocks". > It often fails to get C preprocessor directives right: > >> a08595f76159b09d57553e37a5123f1091bb13e7:http.c aeff8a61216bf6e0d663c08c583bc8552fa3c344:http.c + 429 >> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv >> > curl_easy_setopt(result, CURLOPT_SSLKEY, ssl_key); >> >#endif >> >#if LIBCURL_VERSION_NUM >= 0x070908 >> > if (ssl_capath != NULL) >> > curl_easy_setopt(result, CURLOPT_CAPATH, ssl_capath); >> -1 | i >#endif >> 0 || ci >#if LIBCURL_VERSION_NUM >= 0x072c00 >> || ci > if (ssl_pinnedkey != NULL) >> || ci > curl_easy_setopt(result, CURLOPT_PINNEDPUBLICKEY, ssl_pinnedkey); >> | c >#endif Yes, this is "non-human do not know 'end' is likely to be at the end of a logical block". > And it gets confused by unusual blank line placement: > >> ed55169834a3ce16a271def9630c858626ded34d:tools/eslint/node_modules/doctrine/lib/doctrine.js 2d441493a4a46a511ba1bdf93e442c3288fbe92d:tools/eslint/node_modules/doctrine/lib/doctrine.js + 330 >> vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv >> > name === 'external' || >> > name === 'event')) { >> > name += advance(); >> > name += scanIdentifier(last); >> > >> -1 | i > } >> 0 || ci > if(source.charCodeAt(index) === 0x5B /* '[' */ && source.charCodeAt(index + 1) === 0x5D /* ']' */){ >> || ci > name += advance(); >> || ci > name += advance(); >> | c > } Likewise, this is showing that a "non-human not knowing } is a closing and { is an opening token". By the way, I no longer remember what indent level your "indent-only" heuristics gives to a blank line. Does the closing brace we see above (marked with -1) count as increasing the indentation level from the previous empty-line at indent 0, or does it count as dedenting from the previous empty-line that has virtually at the same indent level as "name += scanIdentifier(last);" line? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html