On Mon, Aug 30, 2010 at 14:00, Marcin Cieslak <saper@xxxxxxxxxx> wrote: > On Mon, 30 Aug 2010, Ævar Arnfjörð Bjarmason wrote: >> On Sun, Aug 29, 2010 at 20:45, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: >> A would be preferred for correctness, and with a fallback BSD printf() >> we can avoid the GNU libc bug, however that'll mean using LC_CTYPE, >> which'll have some small side-effects for the rest of the code. > > The real problem is that you are probably using same functions > (locale-enable) for the user-facing side as well as for the backend (talking > to repository). Some projects decided to use > some special encoding internally (like UCS-2 in case of Java > and Python 2.x, Unicode ordinals in Python 3.x). Otherwise > you may end up in some incompatibilities in the on-disk on on-network > format. I don't think you want to keep telling all bug reporters for few > years - "Can you try that again with env LANG=C, > please?" :) Yeah, those programs can probably get away with it too because they either implement their own string functions, or don't use setlocale() at all for their localizations. > Bringing Unicode onboard means that simple strlen() is no longer > what you normally think it does. I'm pretty sure strlen() always gives you the number of null-terminated bytes regardless of locale settings. wcslen is the wide-characted equivalent. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html