Re: [PATCH] t3910: show failure of core.precomposeunicode with decomposed filenames

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 29.04.14 20:02, Jeff King wrote:
> On Tue, Apr 29, 2014 at 10:12:52AM -0700, Junio C Hamano wrote:
> 
>> Jeff King <peff@xxxxxxxx> writes:
>>
>>> This patch just adds a test to demonstrate the breakage.
>>> Some possible fixes are:
>>>
>>>   1. Tell everyone that NFD in the git repo is wrong, and
>>>      they should make a new commit to normalize all their
>>>      in-repo files to be precomposed.
>>>
>>>      This is probably not the right thing to do, because it
>>>      still doesn't fix checkouts of old history. And it
>>>      spreads the problem to people on byte-preserving
>>>      filesystems (like ext4), because now they have to start
>>>      precomposing their filenames as they are adde to git.
>>
>> Hmm, have we taught the "compare precomposed" for codepaths that
>> compare two trees and a tree and the index, too?  Otherwise, we
>> would have the same issue with commits in the old history.
> 
> Ugh, yeah, I didn't think about that codepath. I think we would not want
> to precompose in that case. IOW, git works byte-wise internally, but it
> is only at the filesystem layer that we do such munging. The index
> straddles the line between the filesystem and git's internal
> representations.
>
[snip]
Please allow me to answer on this post-
I made a suggestion here:
https://github.com/tboegi/git/commit/85305ce306cb88a07dad6350d6ba8c5f2f817af6

The new test in t3910 passes, but the test suite hangs somewhere, there is a whitespace
in precompose_utf8.c, so I don't know what to say.
But in case someone wants to make a code review:


commit 85305ce306cb88a07dad6350d6ba8c5f2f817af6
Author: Torsten Bögershausen <tboegi@xxxxxx>
Date:   Wed Apr 30 10:30:04 2014 +0200

    core.precomposeunicode with decomposed filenames
    
    Commit 750b2e4785e shows a failure of core.precomposeunicode
    when decomposed filenames are in the index.
    
    When decomposed file names are in the index and readdir()
    converts them into the decomposed form, "Git status" will report
    the precomposed file name as untracked.
    
    Solution:
    Precompose file names when reading the index file from disc into memory.

diff --git a/compat/precompose_utf8.c b/compat/precompose_utf8.c
index 95fe849..40ebc2e 100644
--- a/compat/precompose_utf8.c
+++ b/compat/precompose_utf8.c
@@ -57,6 +57,19 @@ void probe_utf8_pathname_composition(char *path, int len)
 }
 
 
+char *precompose_str_len(const char *in, size_t insz, int *outsz)
+{
+	char *prec_str = NULL;
+	if (precomposed_unicode != 1)
+		return NULL;
+
+	if (has_non_ascii(in, insz, NULL))
+		prec_str = reencode_string_len(in, insz, repo_encoding, path_encoding, outsz);
+
+	return prec_str;
+}
+
+
 void precompose_argv(int argc, const char **argv)
 {
 	int i = 0;
diff --git a/compat/precompose_utf8.h b/compat/precompose_utf8.h
index 3b73585..28f6595 100644
--- a/compat/precompose_utf8.h
+++ b/compat/precompose_utf8.h
@@ -26,6 +26,7 @@ typedef struct {
 	struct dirent_prec_psx *dirent_nfc;
 } PREC_DIR;
 
+char *precompose_str_len(const char *in, size_t insz, int *outsz);
 void precompose_argv(int argc, const char **argv);
 void probe_utf8_pathname_composition(char *, int);
 
diff --git a/git-compat-util.h b/git-compat-util.h
index d493a8c..de117d1 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -180,7 +180,7 @@ typedef unsigned long uintptr_t;
 #ifdef PRECOMPOSE_UNICODE
 #include "compat/precompose_utf8.h"
 #else
-#define precompose_str(in,i_nfd2nfc)
+#define precompose_str_len(s,i,o) NULL
 #define precompose_argv(c,v)
 #define probe_utf8_pathname_composition(a,b)
 #endif
diff --git a/read-cache.c b/read-cache.c
index 4b4effd..0887835 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1330,7 +1330,7 @@ static inline uint32_t ntoh_l_force_align(void *p)
 #define ntoh_l(var) ntoh_l_force_align(&(var))
 #endif
 
-static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
+static struct cache_entry *cache_entry_from_ondisk_int(struct ondisk_cache_entry *ondisk,
 						   unsigned int flags,
 						   const char *name,
 						   size_t len)
@@ -1355,6 +1355,22 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on
 	return ce;
 }
 
+static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *ondisk,
+							 unsigned int flags,
+							 const char *name,
+							 size_t len)
+{
+	int prec_len;
+	char *prec_name = precompose_str_len(name, len, &prec_len);
+	if (prec_name) {
+		struct cache_entry *ce;
+		ce = cache_entry_from_ondisk_int(ondisk, flags, prec_name, prec_len);
+		free(prec_name);
+		return ce;
+	}
+	return cache_entry_from_ondisk_int(ondisk, flags, name, len);
+}
+
 /*
  * Adjacent cache entries tend to share the leading paths, so it makes
  * sense to only store the differences in later entries.  In the v4
diff --git a/t/t3910-mac-os-precompose.sh b/t/t3910-mac-os-precompose.sh
index 23aa61e..d27c018 100755
--- a/t/t3910-mac-os-precompose.sh
+++ b/t/t3910-mac-os-precompose.sh
@@ -141,7 +141,7 @@ test_expect_success "Add long precomposed filename" '
 	git commit -m "Long filename"
 '
 
-test_expect_failure 'handle existing decomposed filenames' '
+test_expect_success 'handle existing decomposed filenames' '
 	echo content >"verbatim.$Adiarnfd" &&
 	git -c core.precomposeunicode=false add "verbatim.$Adiarnfd" &&
 	git commit -m "existing decomposed file" &&
 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]