Re: [PATCH] t/perf: correctly align non-ASCII descriptions in output

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 21, 2017 at 10:41 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Fri, Apr 21, 2017 at 07:44:28PM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> Change the test descriptions from being treated as binary blobs by
>> perl to being treated as UTF-8. This ensures that e.g. a test
>> description like "æ" is counted as 1 character, not 2.
>>
>> I have WIP performance tests for non-ASCII grep patterns on another
>> topic that are affected by this.
>
> Makes sense. As this is purely about test titles in our project,
> choosing utf8 as the only encoding is quite sensible.

*Nod*

>> diff --git a/t/perf/aggregate.perl b/t/perf/aggregate.perl
>> index 924b19dab4..1dbc85b214 100755
>> --- a/t/perf/aggregate.perl
>> +++ b/t/perf/aggregate.perl
>> @@ -88,6 +88,7 @@ for my $t (@tests) {
>>  sub read_descr {
>>       my $name = shift;
>>       open my $fh, "<", $name or return "<error reading description>";
>> +     binmode $fh, ":utf8" or die "PANIC on binmode: $!";
>
> I thought there was some "use" flag we could set to just make all of our
> handles utf8. But all I could come up with was stuff like PERLIO and
> "perl -C". Using binmode isn't too bad, though (I think you could
> just do it as part of the open, too, but I'm not sure if antique
> versions of perl support that).

[Debugging perl encoding issues is one of the many perks of my dayjob]

Using binmode like this is about as straightforward as you can get,
the former occurrence could be equivalently replaced by:

    utf8::decode(my $line = <$fh>);

But better just to mark the handle as utf8. There's a fancier way to
do it as part of the three-arg-open syntax, but I couldn't remember
whether all the perl versions we support have it.

About the "use" flag, you're probably thinking of the confusingly
named "use utf8", but that's to set your source code to utf8, not your
handles, e.g.:

$ perl -CA -MDevel::Peek -wE 'use utf8; my $日本語 = shift; Dump $日本語' æ
SV = PV(0x12cc090) at 0x12cded8
  REFCNT = 1
  FLAGS = (PADMY,POK,pPOK,UTF8)
  PV = 0x12de460 "\303\246"\0 [UTF8 "\x{e6}"]
  CUR = 2
  LEN = 16

As you can see people got a bit overexcited about Unicode in the 90s.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]