Jonathan Nieder <jrnieder@xxxxxxxxx> writes: > Drew Northup wrote: > > > Please forgive me for being offended that UTF-16 text is not "generic" > > enough. > > First some words of explanation. > > By "generic" I did not mean ubiquitous, unbranded, popular, or some > other almost-synonym. What I actually meant is that it is not obvious > what to do with UTF-16. Should it be converted to UTF-8 for output? > Should it always be normalized when added to the index, so that > switching between canonically equivalent sequences does not result > in spurious diffs? Should the byte-for-byte representation be > faithfully preserved, even when it is not valid UTF-16? > > When in such a situation, often a good approach is the following: > take care of mechanism first, then policy. So the first thing to do > is to make sure that the code is _capable_ of what people are trying > to do; then one can try various configurations and see what is most > convenient; and finally, one can make sure the program behaves in an > intuitive way by setting a reasonable default. > > So by "generic" I meant those mechanisms that can be used in the > context of multiple policies. It would be nice if there was a way (perhaps stearable via gitattributes) to change whether Git is to treat file as sequence of bytes (as it is now), or as sequence of characters (probably like Perl 6, i.e. as sequence of graphemes), though this would require to specify encoding (and normalization) used. Wishful thinking -- Jakub Narebski Poland ShadeHawk on #git -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html