On 06/16/2016 07:46 PM, Stefan Beller wrote: > The compaction heuristic for diffs was deemed quite good, but in 5580b27 > we have an example demonstrates a common case, that fails with the existing > heuristic. That is why we disabled the heuristic in the v2.9.0 release. > > With this patch a diff looks like: > > def bar > do_bar_stuff() > > common_ending() > end > > + def bal > + do_bal_stuff() > + > + common_ending() > + end > + > def baz > do_baz_stuff() > > common_ending() > end > > whereas before we had: > > WIP (TODO: ask peff to provide an example that actually triggers, I seem to be > unable to write one, though I thought the above was one) > > > The way we do it, is by inspecting the neighboring lines and see how > much indent there is and we choose the line that has the shortest amount > of blanks in the neighboring lines. > > (TODO: update comments in the file) > > Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx> > --- > > Hi Jeff, hi Michael, > > thanks for pointing out the flaw in this ruby code base before the 2.9 release. > I think this patch would fix your finding, though I cannot reproduce it. > Can you provide an example repo/patch that looks ugly on current origin/master, > so I can be sure this fixes the issue? > > This can also be a start of a discussion if this is a too short-sighted enhancement > of the heuristic. Essentially we'd want to prefer > > + end > + > + def bal > > over: > > + do_bal_stuff() > + > + common_ending() > > because there is less space between line start and {end, def bal} > than for {do_bal_stuff, common_ending}. It's really cool that you are trying to implement the indentation heuristic. I'm curious how it works in practice (something we can probably only determine by applying it to a corpus of diffs to see how often it outperforms/underperforms the other possible approaches). The heuristic I proposed was * Prefer to start and end hunks *following* lines with the least indentation. * Define the "indentation" of a blank line to be the indentation of the previous non-blank line minus epsilon. * In the case of a tie, prefer to slide the hunk down as far as possible. If that's what you are trying to implement, then notice that the first rule says that the hunk should start *following* the line with the least indentation. So the "score" for > + end > + > + def bal would be the indentation of the line preceding "end", which is larger than the indentation of "end". And the score for > + do_bal_stuff() > + > + common_ending() would come from the line preceding "do_bal_stuff()", namely "def bal()", which is indented the same as "end". So the latter would be preferred. But actually, I don't understand how your example is meaningful. I think this heuristic is only used to slide hunks around; i.e., when the line following the hunk is the same as the first lines of the hunk, or when the line preceding the hunk is the same as the last line of the hunk. Right? So your snippets would never compete against each other. Let's take the full example. The five possible placements for the `+` characters are marked with the digits 1 through 5 here: > def bar > do_bar_stuff() > 1 > 12 common_ending() > 123 end > 1234 > 12345 def bal > 12345 do_bal_stuff() > 2345 > 345 common_ending() > 45 end > 5 > def baz > do_baz_stuff() > > common_ending() > end Of the lines preceding the candidate hunks, the blank line preceding the hunk marked "5" has the smallest indent, namely "epsilon" less than the indentation of the "end" line preceding it. So placement "5" would be chosen. I don't know enough about this area of the code to review your patch in detail, but I did note below a typo that jumped out at me. Michael > xdiff/xdiffi.c | 41 +++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 39 insertions(+), 2 deletions(-) > > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index b3c6848..58adc74 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > [...] > @@ -485,7 +515,13 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > * the group. > */ > while (ix < nrec && recs_match(recs, ixs, ix, flags)) { > - blank_lines += is_blank_line(recs, ix, flags); > + if (is_blank_line(recs, ix, flags)) { > + unsigned int bl_neigh_indent = > + surrounding_leading_blank(recs, ix, flags, nrec); > + if (min_bl_neigh_indent > bl_neigh_indent) > + min_bl_neigh_indent = min_bl_neigh_indent; The line above has no effect (same variable on both sides of the =). > + blank_lines++; > + } > > rchg[ixs++] = 0; > rchg[ix++] = 1; > @@ -525,6 +561,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) { > while (ixs > 0 && > !is_blank_line(recs, ix - 1, flags) && > + surrounding_leading_blank(recs, ix - 1, flags, nrec) > min_bl_neigh_indent && > recs_match(recs, ixs - 1, ix - 1, flags)) { > rchg[--ixs] = 1; > rchg[--ix] = 0; > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html