Re: Use of new .gitattributes working-tree-encoding attribute across different platform types

Jeff King <peff@xxxxxxxx> · Mon, 2 Jul 2018 14:17:42 -0400

On Sun, Jul 01, 2018 at 05:56:58PM +0000, brian m. carlson wrote:

> On Thu, Jun 28, 2018 at 01:27:07PM -0400, Jeff King wrote:
> > Yeah, that was along the lines that I was thinking. I wonder if anybody
> > would ever need two such auto-encodings, though. Probably not. But
> > another way to think about it would be to allow something like:
> > 
> >   working-tree-encoding=foo
> > 
> > and then in your config "foo" to map to some encoding.
> > 
> > But that may be over-engineering, I dunno. utf8 has always been enough
> > for me. :)
> 
> I had a thought the other day about why this solution might be valuable.
> Different platforms encode different values for iconv character sets.
> So, for example, one may have platforms supporting some disjoint sets of
> the following:
> 
> * LATIN-1
> * LATIN1
> * ISO8859-1
> * ISO-8859-1
> * ISO_8859-1
> * ISO_8859-1:1987
> * some lowercase variants of these
> 
> Therefore, specifying a working-tree-encoding value that works across a
> wide variety of system may be non-trivial.  This is less of a problem
> with UTF-8, but having the ability to pick an encoding and remap it to a
> supported value may be useful nevertheless.

One thing I almost did in the example I gave above was to literally call
the encoding name by a "real" one. I.e.:

  echo '*.txt working-tree-encoding=iso-8859-1' >.gitattributes
  git config encoding.iso-8859-1.replace latin1

or something. But I wondered if it was a little crazy as a practice,
since mapping "iso-8859-1" to "utf-8" is probably going to lead to
headaches.

But your example above of semantically equivalent variants with
different spellings would be a good use of that trick.

It also makes me wonder if there's another layer of indirection
somewhere in the iconv machinery we could be taking advantage of to
accomplish the same thing.  Probably not conveniently or portably, I
guess.

-Peff