Re: [RFD] Handling of non-UTF8 data in gitweb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jakub Narebski <jnareb@xxxxxxxxx> writes:

> On Wed, 7 Dec 2011, Junio C Hamano wrote:
>> I think we added and you acked 00f429a (gitweb: Handle non UTF-8 text
>> better, 2007-06-03) for a good reason, and I think the above argues that
>> whatever issue the commit tried to address is a non-issue. Is it really
>> true?
>
> I think that UTF-8 is much more prevalent character encoding in operating
> systems, programming languages, APIs, and software applications than it
> was 4 years ago.

Yeah, that was the kind of "reasoning behind it" explanation I was hoping
to see spelled out for people to agree or disagree.

But then the updated gitweb won't have trouble showing history of some
projects that has 4 yours or longer history (hopefully Git itself is not
among them).

> The proposed
>
>   use open qw(:std :utf8);
>
> and removal of to_utf8 and $fallback_encoding would be regression compared
> to post-00f429a... but the tradeoff of more robust UTF-8 handling might be
> worth it.
>
>> > ... I guess
>> > it could be emulated by defining our own 'utf-8-with-fallback'
>> > encoding, or by defining our own PerlIO layer with PerlIO::via.
>> > But it no longer be simple solution (though still automatic).
>> 
>> Between the current "everybody who read from the input must remember to
>> call to_utf8" and "everybody gets to read utf8 automatically for internal
>> processing", even though the latter may involve one-time pain to set up
>> the framework to do so, the pros-and-cons feels obvious to me.
>
> There is also a matter of performance (':utf8' and ':encoding(UTF-8)'
> are AFAIK implemented in C, both the Encode part and PerlIO part).

Would a reasonable first step be to replace the calls to bare "open" with
a wrapper that simulates the "open" interface (e.g. "sub git_open"), but
still keep the same behaviour as post-00f429a that could be much slower
than the native one?  Then a separate patch can build a "regression but
uses native and much faster" alternative on top, no?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]