onsdag 25 november 2009 14:47:25 skrev Marc Strapetz: > I have noticed that jgit converts file paths to UTF-8 when querying the > repository. Especially, > org.eclipse.jgit.treewalk.filter.PathFilter#PathFilter performs this > conversion: > > private PathFilter(final String s) { > pathStr = s; > pathRaw = Constants.encode(pathStr); > } > > Because of this conversion, a TreeWalk fails to identify a file with > German umlauts. When using platform encoding to convert the file path to > bytes: > > private PathFilter(final String s) { > pathStr = s; > pathRaw = s.getBytes();e pr > } > > the TreeWalk works as expected. Actually, the file path seems to be > stored with platform encoding in the repository. > > Is this a bug or a misconfiguration of my repository? I'm using jgit > (commit e16af839e8a0cc01c52d3648d2d28e4cb915f80f) on Windows. A bug. The problem here is that we need to allow multiple encodings since there is no reliable encoding specified anywhere. The approach I advocate is the one we use for handling encoding in general. I.e. if it looks like UTF-8, treat it like that else fallback. This is expensive however and then we have all the other issues with case insensitive name and the funny property that unicode has when it allows characters to be encoding using multiple sequences of code points as empoloyed by Apple. -- robin -- robin -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html