Re: [PATCH v1] convert: add support for 'encoding' attribute

Lars Schneider <larsxschneider@xxxxxxxxx> · Thu, 14 Dec 2017 00:02:32 +0100

> On 13 Dec 2017, at 19:11, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> 
> Lars Schneider <larsxschneider@xxxxxxxxx> writes:
> 
>> ... In a perfect world I think I would store
>> the encoding of a file in the tree object. I didn't pursue that solution
>> as this would change the Git data model which would open a can of worms
>> for a problem that not that many people have (almost everyone is on
>> UTF-8 anyways).
> 
> Having that "encoding" trailt recorded in the tree that contains the
> blob would mean that the same blob can be recorded with one
> "encoding" trait in a tree, and in a different tree it can be
> recorded with a different "encoding" trait.  I doubt it really makes
> sense.

The "blob object" would store the text data encoded in a canonical 
UTF-8 form. The "tree object" would store the encoding. On checkin 
Git would convert the text from the stored encoding to UTF-8 and on 
checkout it would do the reverse.

That way you could control the encoding for a text file specific
for each path similar to the "mode bits". That also means you could
change the encoding of a file while the blob content stays the same.

Changing the encoding of a file with the .gitattributes approach
can be difficult if you have configured the attribute with a very 
broad pattern (e.g. *.foo). You would either need to rename the
file or limit the scope of your pattern.

- Lars