Re: [RFC/PATCH] git-gui: Use gitattribute "encoding" for file content display

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed. 23 Jan 2008, Steffen Prohaska wrote:
> On Jan 23, 2008, at 6:55 AM, Junio C Hamano wrote:
>> "Shawn O. Pearce" <spearce@xxxxxxxxxxx> writes:
>>
>>> git-gui: Use gitattribute "encoding" for file content display
>>>
>>> Most folks using git-gui on internationalized files have complained
>>> that it doesn't recognize UTF-8 correctly.  In the past we have just
>>> ignored the problem and showed the file contents as binary/US-ASCII,
>>> which is wrong no matter how you look at it.
>>
>> Hmmm.
>>
>> At least for now in 1.5.4, I'd prefer the way gitk shows UTF-8
>> (if I recall correctly latin-1 or other legacy encoding, as long
>> as LANG/LC_* is given appropriately, as well) contents without
>> per-path configuration without introducing new attributes.
> 
> Shouldn't we first try harder to get things right without adding
> an attribute?  Maybe we could continue a good tradition and look
> at the content of the first: we could first look for hints in the
> file about the encoding.  XML and many text files contain such
> hints already to help editors.  For example,  Python source can
> explicitly contain the encoding [1]; and I guess there are many
> other examples.

For example LaTeX files either use inputenc package to set encoding
(e.g. \usepackage[latin2]{inputenc}) or use magic first line to
specify TCX (TeX character translation) file 
(e.g. %& -translate-file=il2-t1).

Emacs encourages to use file variables, either in the form of magic
first line, or file variables at the end of file; I think the same
is true for Vim.


I'd like then for it to be at least as configurable as diff.*.funcname 
is for diff.

> If we don't find a direct hint, we could have 
> some magic auto-detection similar to what we do for autocrlf.

We can at least try to and check for UTF-16 magic first two bytes, and 
detect if we have character which is invalid in UTF-8 (for performance 
I guess checking only beginning of file)... 

> As a fallback the user could specify a default encoding.  But only
> as a last resort, I'd use explicit attributes.

...and then falling back to fallback encoding, like gitweb does.

-- 
Jakub Narebski
Poland
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux