Re: [PATCH v1] convert: add support for 'encoding' attribute

Jeff King <peff@xxxxxxxx> · Mon, 18 Dec 2017 07:59:08 -0500

On Mon, Dec 18, 2017 at 11:54:32AM +0100, Lars Schneider wrote:

> >  warning: failed to encode 'file' from utf-8 to utf16
> > 
> > At least it figured out that it couldn't convert the content. It's
> > slightly troubling that it would try in the first place, though; are
> > there encoding pairs where we might accidentally generate nonsense?
> 
> At this point we interpret utf-16 content as utf-8 and try to convert
> it to utf-16. That of course fails because utf-16 content is no valid
> utf-8. How could we stop trying that? How could Git possibly know what
> kind of encoding is used (apart from our new hint in gitattributes)?

Yeah, sorry if I wasn't clear: I don't really have an answer to those
questions either. So this is probably the best we can do. I was mostly
just trying to think through the worst case, and what could go wrong.

> > It may make sense to die() during "git add ." (since we're actually
> > changing the index entry, and we don't want to put nonsense into a
> > tree). But I'm not sure it's the best thing for operations which just
> > want to read the content. For them, perhaps it would be more appropriate
> > to issue a warning and return the untouched content.
> 
> Absolutely! Thanks for spotting this. I will try to run die() only on
> "git add" in v2.

Great, thanks!

-Peff