On Fri, Nov 16, 2018 at 3:04 AM Phillip Wood <phillip.wood@xxxxxxxxxxxx> wrote: > > From: Phillip Wood <phillip.wood@xxxxxxxxxxxxx> > > Currently diff --color-moved-ws=allow-indentation-change does not > support indentation that contains a mix of tabs and spaces. For > example in commit 546f70f377 ("convert.h: drop 'extern' from function > declaration", 2018-06-30) the function parameters in the following > lines are not colored as moved [1]. > > -extern int stream_filter(struct stream_filter *, > - const char *input, size_t *isize_p, > - char *output, size_t *osize_p); > +int stream_filter(struct stream_filter *, > + const char *input, size_t *isize_p, > + char *output, size_t *osize_p); > > This commit changes the way the indentation is handled to track the > visual size of the indentation rather than the characters in the > indentation. This has they benefit that any whitespace errors do not s/they/the/ > interfer with the move detection (the whitespace errors will still be > highlighted according to --ws-error-highlight). During the discussion > of this feature there were concerns about the correct detection of > indentation for python. However those concerns apply whether or not > we're detecting moved lines so no attempt is made to determine if the > indentation is 'pythonic'. > > [1] Note that before the commit to fix the erroneous coloring of moved > lines each line was colored as a different block, since that commit > they are uncolored. > > Signed-off-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx> > --- > > Notes: > Changes since rfc: > - It now replaces the existing implementation rather than adding a new > mode. > - The indentation deltas are now calculated once for each line and > cached. > - Optimized the whitespace delta comparison to compare string lengths > before comparing the actual strings. > - Modified the calculation of tabs as suggested by Stefan. > - Split out the blank line handling into a separate commit as suggest > by Stefan. > - Fixed some comments pointed out by Stefan. > > diff.c | 130 +++++++++++++++++++++---------------- > t/t4015-diff-whitespace.sh | 56 ++++++++++++++++ > 2 files changed, 129 insertions(+), 57 deletions(-) > > diff --git a/diff.c b/diff.c > index c378ce3daf..89559293e7 100644 > --- a/diff.c > +++ b/diff.c > @@ -750,6 +750,8 @@ struct emitted_diff_symbol { > const char *line; > int len; > int flags; > + int indent_off; > + int indent_width; So this is the trick how we compute the ws related data only once per line. :-) On the other hand, we do not save memory by disabling the ws detection, but I guess that is not a problem for now. Would it make sense to have the new variables be unsigned? (Also a comment on what they are, I needed to read the code to understand off to be offset into the line, where the content starts, and width to be the visual width, as I did not recall the RFC.) > +static void fill_es_indent_data(struct emitted_diff_symbol *es) > [...] > + if (o->color_moved_ws_handling & > + COLOR_MOVED_WS_ALLOW_INDENTATION_CHANGE) > + fill_es_indent_data(&o->emitted_symbols->buf[n]); Nice. By reducing the information kept around to ints, we also do not need to alloc/free memory for each line. > +++ b/t/t4015-diff-whitespace.sh > @@ -1901,4 +1901,60 @@ test_expect_success 'compare whitespace delta incompatible with other space opti > test_i18ngrep allow-indentation-change err > ' > > +test_expect_success 'compare mixed whitespace delta across moved blocks' ' Looks good, Thanks! Stefan