Re: [BUG-ish] diff compaction heuristic false positive

Jeff King <peff@xxxxxxxx> · Fri, 10 Jun 2016 04:41:50 -0400

On Fri, Jun 10, 2016 at 10:31:13AM +0200, Michael Haggerty wrote:

> I've often thought that indentation would be a good, fairly universal
> signal for diff to use when deciding how to slide hunks around. Most
> source code is indented in a way that shows its structure.
> 
> I propose the following heuristic:
> 
> * Prefer to start and end hunks following lines with the least
>   indentation.
> 
> * Define the "indentation" of a blank line to be the indentation of
>   the previous non-blank line minus epsilon.
> 
> * In the case of a tie, prefer to slide the hunk down as far as
>   possible.

Hmm. That might help this case, but the original motivation for this
heuristic was something like:

  ##
  # foo
  def foo
    something
  end

  ##
  # bar
  def bar
    something_else
  end

where we add the first function above the second. We end up with:

diff --git a/file.rb b/file.rb
index 1f9b151..f991c76 100644
--- a/file.rb
+++ b/file.rb
@@ -1,4 +1,10 @@
 ##
+# foo
+def foo
+  something
+end
+
+##
 # bar
 def bar
   something else

I.e., crediting the "##" to the wrong spot (or in C, the "/*"). I don't
think indentation helps us there (sliding-up would, but like
sliding-down, it just depends on the order of the hunks).

So I agree that adding indentation to the mix might help, but I don't
think it can replace this heuristic.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html