On Sat, Jan 26, 2013 at 09:30:00PM -0800, Junio C Hamano wrote: > Michael Haggerty <mhagger@xxxxxxxxxxxx> writes: > > > This will still fail under Python 2.x if repo.path is a byte string that > > contains non-ASCII characters. And it will fail under Python 3.1 and > > later if repo.path contains characters using the surrogateescape > > encoding option [1],... > > Here you don't really need byte-for-byte correctness; it would be enough > > to get *some* byte string that is unique for a given input ... > > Yeek. > > As we do not care about the actual value at all, how about doing > something like this instead? > > + hasher.update(".".join([str(ord(c)) for c in repo.path])) This doesn't solve the original problem since we're still ending up with a Unicode string. If we wanted something like this it would need to be: hasher.update(b'.'.join([b'%X' % ord(c) for c in repo.path])) which limits us to Python 2.6 and later and seems to me to be less clear than introducing an "encode_filepath" helper function using Michael's suggestion. John -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html