[PATCH RFC] Rename detection and whitespace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I recently had an issue with a merge where there was a branch of some
code made to do some format standardisation e.g. re-indenting,
normalising of whitespace etc. etc. and another branch to do with
tidying up the codebase (moving some files around etc.)

When we brought the two branches back together git struggled (failed)
to apply the formatting changes to the files that were moved (-s
recursive -X ignore-all-space).  Now in terms of best practice
obviously what we did was stupid, and we should never have
parallelised the two pieces of work and I imagine that this may well
be considered 'known behaviour / won't fix' but I wanted to raise it
and get told that rather than assume it :)

After a minimal amount of investigation I found that with a small
patch I could make my local git instance perform the merge flawlessly
(in my case).

For me I tracked the behaviour down to the blob similarity calculation
that takes place in the diffcore-delta.c#hash_chars method.  In our
case the problem was we were adjusting the whitespace at the front of
each line which meant that the 64 byte segment hashes were
different/mis-aligned between the 2 equivalent files.   This code
already 'normalises' out CRLF/LF differences by skipping any CR
characters when followed by LF so my question is that would it be
considered wrong/evil to ignore *all* whitespace characters when -X
ignore-all-space has been passed.

The behaviour is trivial to reproduce without a merge:
  i) Create a text file with a few lines in it.
 ii) Add the file and commit it.
iii) Insert a space at the front of each line and rename the file
iv) git add -A .
v) git status will now show a 'new file' and a 'deleted file'

With the patch inlined below (not suitable for inclusion, naive in so
many ways (what is whitespace, and should be optional/configurable to
name just two!) git status will report a rename instead.  This
behaviour carries across to merging.

Presumably I'm missing something important, but the improved rename
detection for us was outstanding!

---
 diffcore-delta.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/diffcore-delta.c b/diffcore-delta.c
index 7cf431d..5429b8d 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -146,6 +146,9 @@ static struct spanhash_top *hash_chars(struct
diff_filespec *one)
 		if (is_text && c == '\r' && sz && *buf == '\n')
 			continue;

+		if (is_text && ( c == '\t' || c ==' ' ) )
+			continue;
+
 		accum1 = (accum1 << 7) ^ (accum2 >> 25);
 		accum2 = (accum2 << 7) ^ (old_1 >> 25);
 		accum1 += c;
-- 
1.7.4.1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]