Re: [RFC PATCH 4/4] git-p4: use utf-8 encoding for file paths throughout

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 27, 2019 at 5:32 PM Yang Zhao <yang.zhao@xxxxxxxxxxxxxx> wrote:
>
> Try to decode file paths in responses from p4 as soon as possible so
> that we are working with unicode string throughout the rest of the flow.
> This makes python 3 a lot happier.
>
> Signed-off-by: Yang Zhao <yang.zhao@xxxxxxxxxxxxxx>
> ---
>
> This is probably the most risky patch out of the set. It's very likely
> that I've neglected to consider certain corner cases with decoding of
> path data.

Yes, this does seem somewhat risky to me.  It may go well on platforms
that require all filenames to be unicode.  And it may work for users
who happen to restrict their filenames to valid utf-8.  But this
abstraction doesn't fit the general problem, so some users may be left
out in the cold.

I tried multiple times while switching git-filter-repo from python2 to
python3, at different levels of pervasiveness, to use unicode more
generally.  But I mostly gave up; everyone knows files won't
necessarily be unicode, but you just can't assume filenames or commit
messages or branch or tag names (and perhaps a few other things I'm
forgetting) are either.  I ended up using bytestrings everywhere
except messages displayed to the user, and I only decode at that
point.


Of course, if perforce happens to only work with unicode filenames
then you'll be fine.  And perhaps you don't want or need to be as
paranoid as I was about what people could do.  So I don't know if my
experience applies in your case (I've never used perforce myself), but
I just thought I'd mention it in case it's useful.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux