[PATCH] word diff: handle zero length matches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>

If find_word_boundaries() encounters a zero length match (which can be
caused by matching a newline or using '*' instead of '+' in the regex)
we stop splitting the input into words which generates an inaccurate
diff. To fix this increment the start point when there is a zero
length match and try a new match. This is safe as posix regular
expressions always return the longest available match so a zero length
match means there are no longer matches available from the current
position.

Commit bf82940dbf1 (color-words: enable REG_NEWLINE to help user,
2009-01-17) prevented matching newlines in negated character classes
but it is still possible for the user to have an explicit newline
match in the regex which could cause a zero length match.

One could argue that having explicit newline matches or using '*'
rather than '+' are user errors but it seems to be better to work
round them than produce inaccurate diffs.

Signed-off-by: Phillip Wood <phillip.wood@xxxxxxxxxxxxx>
---
    word diff: handle zero length matches
    
    If find_word_boundaries() encounters a zero length match (which can be
    caused by matching a newline or using '*' instead of '+' in the regex)
    we stop splitting the input into words which generates an inaccurate
    diff. To fix this increment the start point when there is a zero length
    match and try a new match.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-947%2Fphillipwood%2Fwip%2Fword-diff-zero-length-matches-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-947/phillipwood/wip/word-diff-zero-length-matches-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/947

 diff.c                | 10 +++++++---
 t/t4034-diff-words.sh |  5 +++++
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 4acccd9d7edb..c8b1d724349c 100644
--- a/diff.c
+++ b/diff.c
@@ -2053,7 +2053,7 @@ static void fn_out_diff_words_aux(void *priv,
 static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
 		int *begin, int *end)
 {
-	if (word_regex && *begin < buffer->size) {
+	while (word_regex && *begin < buffer->size) {
 		regmatch_t match[1];
 		if (!regexec_buf(word_regex, buffer->ptr + *begin,
 				 buffer->size - *begin, 1, match, 0)) {
@@ -2061,9 +2061,13 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t *word_regex,
 					'\n', match[0].rm_eo - match[0].rm_so);
 			*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
 			*begin += match[0].rm_so;
-			return *begin >= *end;
+			if (*begin == *end)
+				(*begin)++;
+			else
+				return *begin > *end;
+		} else {
+			return -1;
 		}
-		return -1;
 	}
 
 	/* find the next word */
diff --git a/t/t4034-diff-words.sh b/t/t4034-diff-words.sh
index ee7721ab9135..561c582d1615 100755
--- a/t/t4034-diff-words.sh
+++ b/t/t4034-diff-words.sh
@@ -184,6 +184,11 @@ test_expect_success 'word diff with a regular expression' '
 	word_diff --color-words="[a-z]+"
 '
 
+test_expect_success 'word diff with zero length matches' '
+	cp expect.letter-runs-are-words expect &&
+	word_diff --color-words="[a-z${LF}]*"
+'
+
 test_expect_success 'set up a diff driver' '
 	git config diff.testdriver.wordRegex "[^[:space:]]" &&
 	cat <<-\EOF >.gitattributes

base-commit: 7e391989789db82983665667013a46eabc6fc570
-- 
gitgitgadget



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux