Re: [PATCH v1] convert: add support for 'encoding' attribute

Lars Schneider <larsxschneider@xxxxxxxxx> · Tue, 12 Dec 2017 00:42:23 +0100

On 11 Dec 2017, at 21:47, Johannes Sixt <j6t@xxxxxxxx> wrote:

> Am 11.12.2017 um 16:50 schrieb lars.schneider@xxxxxxxxxxxx:
>> From: Lars Schneider <larsxschneider@xxxxxxxxx>
>> Git and its tools (e.g. git diff) expect all text files in UTF-8
>> encoding. Git will happily accept content in all other encodings, too,
>> but it might not be able to process the text (e.g. viewing diffs or
>> changing line endings).
>> Add an attribute to tell Git what encoding the user has defined for a
>> given file. If the content is added to the index, then Git converts the
>> content to a canonical UTF-8 representation. On checkout Git will
>> reverse the conversion.
>> Reviewed-by: Patrick Lühne <patrick@xxxxxxxxx>
>> Signed-off-by: Lars Schneider <larsxschneider@xxxxxxxxx>
>> ---
>> Hi,
>> here is a WIP patch to add text encoding support for files encoded with
>> something other than UTF-8 [RFC].
>> The 'encoding' attribute is already used to view blobs in gitk. That
>> could be a problem as the content is stored in Git with the defined
>> encoding. This patch would interpret the content as UTF-8 encoded and
> 
> This will be a major drawback for me because my code base stores text files that are not UTF-8 encoded. And I do use the existing 'encoding' attribute to view the text in git-gui and gitk. Repurposing this attribute name is not an option, IMO.

I understand your point of view and I kind of expected that that reply.
Thanks for the feedback!

Question is: Given that "encoding" is not available, how could I name
             the attribute without confusing the user?

I contemplated:
  - "enc" or "encode" because "eol" and "ident" use abbreviations, too
    (enc could be confused with encryption. plus, a user might ask
     what is the difference between "enc" and "encoding" attribute :-)
  - "wte", "wtenc", or "worktree-encoding" to emphasize that this is 
    the encoding used in the worktree 
    (I fear that users think that is git-worktree, the command, related)

I think my favorite is "worktree-encoding".
What do you think?

Thanks,
Lars 

BTW: I am curios, can you share what encoding you use?
My main use case is UTF-16 and I was surprised that I haven't
found a single public repo on github.com with "encoding=utf-16"