Re: [PATCH v1] convert: add support for 'encoding' attribute

Johannes Sixt <j6t@xxxxxxxx> · Mon, 11 Dec 2017 21:47:24 +0100

Am 11.12.2017 um 16:50 schrieb lars.schneider@xxxxxxxxxxxx:
From: Lars Schneider <larsxschneider@xxxxxxxxx>

Git and its tools (e.g. git diff) expect all text files in UTF-8
encoding. Git will happily accept content in all other encodings, too,
but it might not be able to process the text (e.g. viewing diffs or
changing line endings).

Add an attribute to tell Git what encoding the user has defined for a
given file. If the content is added to the index, then Git converts the
content to a canonical UTF-8 representation. On checkout Git will
reverse the conversion.

Reviewed-by: Patrick Lühne <patrick@xxxxxxxxx>
Signed-off-by: Lars Schneider <larsxschneider@xxxxxxxxx>
---

Hi,

here is a WIP patch to add text encoding support for files encoded with
something other than UTF-8 [RFC].

The 'encoding' attribute is already used to view blobs in gitk. That
could be a problem as the content is stored in Git with the defined
encoding. This patch would interpret the content as UTF-8 encoded and

This will be a major drawback for me because my code base stores text 
files that are not UTF-8 encoded. And I do use the existing 'encoding' 
attribute to view the text in git-gui and gitk. Repurposing this 
attribute name is not an option, IMO.

it would try to reencode it to the defined encoding on checkout > Plus, many repos define the attribute very broad (e.g. "* 
encoding=cp1251").
These folks would see errors like these with my patch:
     error: failed to encode 'foo.bar' from utf-8 to cp1251

-- Hannes