Re: [PATCH v3 6/8] git-remote-testpy: hash bytes explicitly

John Keeping <john@xxxxxxxxxxxxx> · Sun, 27 Jan 2013 14:21:54 +0000

On Sat, Jan 26, 2013 at 09:30:00PM -0800, Junio C Hamano wrote:
> Michael Haggerty <mhagger@xxxxxxxxxxxx> writes:
> 
> > This will still fail under Python 2.x if repo.path is a byte string that
> > contains non-ASCII characters.  And it will fail under Python 3.1 and
> > later if repo.path contains characters using the surrogateescape
> > encoding option [1],...
> > Here you don't really need byte-for-byte correctness; it would be enough
> > to get *some* byte string that is unique for a given input ...
> 
> Yeek.
> 
> As we do not care about the actual value at all, how about doing
> something like this instead?
> 
> +    hasher.update(".".join([str(ord(c)) for c in repo.path]))

This doesn't solve the original problem since we're still ending up with
a Unicode string.  If we wanted something like this it would need to be:

    hasher.update(b'.'.join([b'%X' % ord(c) for c in repo.path]))

which limits us to Python 2.6 and later and seems to me to be less clear
than introducing an "encode_filepath" helper function using Michael's
suggestion.

John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html