Re: git archive --format zip utf-8 issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



René Scharfe <rene.scharfe@xxxxxxxxxxxxxx> writes:

> ... A more interesting question is: What's supported by
> which programs?

Yes, that is the most interesting question.

>> Of course, "git archive --format=zip --path-reencode=utf8-to-latin1"
>> would be the most generic way to do this.
>
> I really hope we can make do without additional options.

We need to at least know the path encoding used in the tree objects,
and I'd be OK with a solution that assumes a single encoding is used
for the entire tree.

We would eventually need to also know the encoding used on the local
working tree (i.e. in what encoding paths are returned from
readdir() and the pathspec the user gives us from the command line),
and iconv it to the tree objects encoding for the project when
creating a cache_entry object to be fed to add_to_index(), and iconv
it back from the tree objects encoding to the working tree encoding
in write_entry(), but that is a longer term direction.  For now, in
order to address the immediate issue, we only need the tree object
encoding, which should default to UTF-8 for interoperability.

So "git archive --format=zip --in-object-path-encoding=big5" for a
project whose tree object pathnames are in that encoding (and we
always record paths in UTF-8 when writing zipfiles) should be the
minimal that we need for now.

Optionally, with a configuration variable i18n.inObjectPathEncoding
(as opposed to the eventual i18n.worktreePathEncoding) set to big5,
users of such a project can say "git archive --format=zip" without
the "--in-object-path-encoding" option.

Considering that zip is a format meant for exchange, I'd think we
would be fine to always write in UTF-8 and leaving the readers
responsible for converting the pathname while extracting.  If a
major zip extractor is incapable of handling UTF-8 (or even if
capable it is cumbersome, for that matter), we may end up having to
add "--in-archive-path-encoding=UTF-8" option to "git archive", with
associated "zip.archivePathEncoding" variable, though.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]