[your original probably didn't make it to the list because of its 5MB attachment; the list has a 100K limit; I'll try to quote liberally] On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote: > I ran this version of the patch against the entire Linux kernel > history, as I figured this has a large batch of C code to try and spot > any issues. > > I ran something like the following command in bash > > $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git > show --format="commit %H" --no-compaction-heuristic $rev) <(git show > --format="commit %H" --compaction-heuristic $rev); done > > heuristic.patch My earlier tests with the perl script were all done with "git log -p", which will not show anything at all for merges (and my script wouldn't know how to deal with combined diffs anyway). But I think this new patch _will_ kick in for combined diffs (because it is built on individual diffs). It will be interesting to see if this has any effect there, and what it looks like. We should be able to see it (on a small enough repository) with: git log --format='commit %H' --cc --merges and comparing the before/after. > I've attached the file that I generated for the Linux history, it's > rather large so hopefully I can get some help to spot any differences. > The above approach will work for pretty much any repository, and works > better than trying to generate the entire thing first and then diff > (since that runs out of memory pretty fast). I don't think there is much point in generating a complete diff between the patches for every commit, when nobody can look at the whole thing. Unless we have automated tooling to find "interesting" bits (and certainly a tool to remove the boring "a comment got shifted by one" lines would help; those are all known improvements, but it's the _other_ stuff we want to look). But if we are not using automated tooling to find the needle in the haystack, we might as well using sampling to make the dataset more manageable. Adding "--since=1.year.ago" is one way, though we may want to sample more randomly across time. > So far, I haven't spotted anything that would want me to disable it, > while I've spotted several cases where I felt that readability was > improved. It's somewhat difficult to spot though. I did find one case that I think is worse. Look at 857942fd1a in the kernel. It has a pattern like this: ... surrounding code ... function_one(); ... more surrounding code ... which becomes: ... surrounding code ... function_two(); ... more surrounding code Without the new heuristic, that looks like: -function_one(); +function_two(); + but with it, it becomes: + +function_two(); -function_one(); which is kind of weird. Having the two directly next to each other reads better to me. This is a pretty unusual diff, though, in that it did change the surrounding whitespace (and if you look further in the diff, the identical change is made elsewhere _without_ touching the whitespace). So this is kind of an anomaly. And IMHO the weirdness here is outweighed by the vast number of improvements elsewhere. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html