On Wed, Nov 22, 2017 at 3:41 PM, Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > Teach the patience diff to attempt preventing user-specified lines from > appearing as a deletion or addition in the end result. The end user can > use this by specifying "--anchor=<text>" one or more times when using > Git commands like "diff" and "show". > > Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx> > --- > Actual patch instead of RFC. > > One thing that might help is to warn if --anchor is used without > --patience, but I couldn't find a good place to put that warning. Let me > know if you know of a good place. Would it make sense to have `--anchor` imply patience? (not necessarily in this patch, might be a "yes, let's do it in a year when users complain") > Replying to Stefan's and Junio's comments: > >> The solution you provide is a good thing to experiment with, but >> longer term, I would want to have huge record of configs in which >> humans selected the best diff, such that we can use that data >> to reason about better automatic diff generation. >> The diff heuristic was based on a lot of human generated data, >> that was generated by Michael at the time. I wonder if we want to >> permanently store the anchor so the data collection will happen >> automatically over time. > > I think machine learning is beyond the scope of this patch :-) agreed; I just wanted to share what I think we could do in the future to select sane default. For that we'd want to collect some "most useful" configurations. When I proposed separate flags for the move detection regarding ignoring whitespaces, the question "how is the user sanely select from so many flags?" came up. And in that spirit I would want think adding this rather fundamental flag, and then machine learn (e.g. the weights in traversing the diff matrix) off of this collected data later might be a viable approach. >> or rather: "c is not moved, we don't care how the diff actually looks >> like", >> so maybe >> ! grep "+c" diff > > I think it's less error-prone to show "a" moving. With this, if the > command somehow prints nothing, the test would still pass. Makes sense. > diff --git a/t/t4033-diff-patience.sh b/t/t4033-diff-patience.sh > index 113304dc5..2d00d1056 100755 > --- a/t/t4033-diff-patience.sh > +++ b/t/t4033-diff-patience.sh I was waiting for test_expect_success 'one --anchor anchors many lines' ' printf "a\nb\na\nc\na\n" >file && # many 'a's .... --anchor=a ... Thanks for writing this patch, I hope we can make use of this addition eventually a lot. :) Stefan