On Tue, Apr 29, 2014 at 10:12:52AM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > This patch just adds a test to demonstrate the breakage. > > Some possible fixes are: > > > > 1. Tell everyone that NFD in the git repo is wrong, and > > they should make a new commit to normalize all their > > in-repo files to be precomposed. > > > > This is probably not the right thing to do, because it > > still doesn't fix checkouts of old history. And it > > spreads the problem to people on byte-preserving > > filesystems (like ext4), because now they have to start > > precomposing their filenames as they are adde to git. > > Hmm, have we taught the "compare precomposed" for codepaths that > compare two trees and a tree and the index, too? Otherwise, we > would have the same issue with commits in the old history. Ugh, yeah, I didn't think about that codepath. I think we would not want to precompose in that case. IOW, git works byte-wise internally, but it is only at the filesystem layer that we do such munging. The index straddles the line between the filesystem and git's internal representations. I think my "keep the normalized names alongside index entries" approach might still work there. But it means that we compare against the "real" byte-wise names on the tree side, and against the normalized names on the path side. But that means having two comparison/lookup functions for the index, and always using the right one. And algorithms that rely on traversing two sorted lists cannot work in both directions. > Do we have a similar issue for older commit in a history under > "ignore-case" as well? I don't think so, because we handle ignorecase completely differently. There we use the name-hash with a case-insensitive hash and a case-insensitive comparison function. And we use strcasecmp liberally throughout the code. I don't think we have a "str_utf8_cmp" that ignores normalizations (or maybe strcoll will do this?). But in theory we could use it everywhere we use strcasecmp for ignore_case. And then we would not need to have our readdir wrapper, maybe? I admit I haven't thought that much about _either_ approach. But aside from some bugs in the hash system, I do not recall seeing any design problems in the ignorecase code. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html