On Mon, Dec 11, 2017 at 09:47:24PM +0100, Johannes Sixt wrote: > Am 11.12.2017 um 16:50 schrieb lars.schneider@xxxxxxxxxxxx: > >From: Lars Schneider <larsxschneider@xxxxxxxxx> > > > >Git and its tools (e.g. git diff) expect all text files in UTF-8 > >encoding. Git will happily accept content in all other encodings, too, > >but it might not be able to process the text (e.g. viewing diffs or > >changing line endings). > > > >Add an attribute to tell Git what encoding the user has defined for a > >given file. If the content is added to the index, then Git converts the > >content to a canonical UTF-8 representation. On checkout Git will > >reverse the conversion. > > > >Reviewed-by: Patrick Lühne <patrick@xxxxxxxxx> > >Signed-off-by: Lars Schneider <larsxschneider@xxxxxxxxx> > >--- > > > >Hi, > > > >here is a WIP patch to add text encoding support for files encoded with > >something other than UTF-8 [RFC]. > > > >The 'encoding' attribute is already used to view blobs in gitk. That > >could be a problem as the content is stored in Git with the defined > >encoding. This patch would interpret the content as UTF-8 encoded and > > This will be a major drawback for me because my code base stores text files > that are not UTF-8 encoded. And I do use the existing 'encoding' attribute > to view the text in git-gui and gitk. Repurposing this attribute name is not > an option, IMO. Just to confirm my missing knowledge here: Does this mean, that git-gui and gitk can decode/reencode the content of a file/blob, when the .gitattributes say so ? If yes, would it make sense to enhance the "git diff" instead ? "git diff --encoding" will pick up the commited encoding from .attributes, convert it into UTF-8, and run the diff ? We actually could enhance the "git diff" output with a single line saying "Git index-encoding=cp1251" or so, which can be picked up by "git apply". The advantage would be that we could continue to commit in UTF-16 as before, and avoid the glitches with .gitattributes, that Peff pointed out. Does this make sense ? > > >it would try to reencode it to the defined encoding on checkout > Plus, > >many repos define the attribute very broad (e.g. "* > encoding=cp1251"). Is this a user mistake ? > >These folks would see errors like these with my patch: > > error: failed to encode 'foo.bar' from utf-8 to cp1251 > > -- Hannes