On Tue, Apr 19, 2016 at 9:18 PM, Jeff King <peff@xxxxxxxx> wrote: > [your original probably didn't make it to the list because of its 5MB > attachment; the list has a 100K limit; I'll try to quote liberally] > > On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote: > >> I ran this version of the patch against the entire Linux kernel >> history, as I figured this has a large batch of C code to try and spot >> any issues. >> >> I ran something like the following command in bash >> >> $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git >> show --format="commit %H" --no-compaction-heuristic $rev) <(git show >> --format="commit %H" --compaction-heuristic $rev); done > >> heuristic.patch > > My earlier tests with the perl script were all done with "git log -p", > which will not show anything at all for merges (and my script wouldn't > know how to deal with combined diffs anyway). But I think this new patch > _will_ kick in for combined diffs (because it is built on individual > diffs). It will be interesting to see if this has any effect there, and > what it looks like. > > We should be able to see it (on a small enough repository) with: > > git log --format='commit %H' --cc --merges > > and comparing the before/after. > >> I've attached the file that I generated for the Linux history, it's >> rather large so hopefully I can get some help to spot any differences. >> The above approach will work for pretty much any repository, and works >> better than trying to generate the entire thing first and then diff >> (since that runs out of memory pretty fast). > > I don't think there is much point in generating a complete diff between > the patches for every commit, when nobody can look at the whole thing. > Unless we have automated tooling to find "interesting" bits (and > certainly a tool to remove the boring "a comment got shifted by one" > lines would help; those are all known improvements, but it's the _other_ > stuff we want to look). > > But if we are not using automated tooling to find the needle in the > haystack, we might as well using sampling to make the dataset more > manageable. Adding "--since=1.year.ago" is one way, though we may want > to sample more randomly across time. > >> So far, I haven't spotted anything that would want me to disable it, >> while I've spotted several cases where I felt that readability was >> improved. It's somewhat difficult to spot though. > > I did find one case that I think is worse. Look at 857942fd1a in the > kernel. It has a pattern like this: > > ... surrounding code ... > > function_one(); > ... more surrounding code ... > > which becomes: > > ... surrounding code ... > > function_two(); > > ... more surrounding code > > Without the new heuristic, that looks like: > > -function_one(); > +function_two(); > + > > but with it, it becomes: > > + > +function_two(); > > -function_one(); > > which is kind of weird. Having the two directly next to each other reads > better to me. This is a pretty unusual diff, though, in that it did > change the surrounding whitespace (and if you look further in the diff, > the identical change is made elsewhere _without_ touching the > whitespace). So this is kind of an anomaly. And IMHO the weirdness here > is outweighed by the vast number of improvements elsewhere. The new implementation supports the flags for ignoring white space, too. > > -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html