Re: [PATH/RFC] parse-options: report invalid UTF-8 switches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 11, 2013 at 6:19 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Mon, Feb 11, 2013 at 09:07:53AM -0800, Junio C Hamano wrote:
>
>> Erik Faye-Lund <kusmabite@xxxxxxxxx> writes:
>>
>> > However, since git only looks at one byte at the time for
>> > short-options, it ends up reporting a partial UTF-8 sequence
>> > in such cases, leading to corruption of the output.
>>
>> Isn't it a workable, easier and more robust alternative to punt and
>> use the entire ctx.argv[0] as unrecognized?
>
> Yes, but it regresses the usability:
>
>   [before]
>   $ git foobar -qrxs
>   unknown switch: x
>
>   [after]
>   $ git foobar -qrxs
>   unknown switch: -qrxs
>
> One is much more informative than the other, and you are punishing the
> common ascii case for the extremely uncommon case of utf-8. Maybe:
>
>   if (isascii(*ctx.opt))
>           error("unknown option `%c'", *ctx.opt);
>   else
>           error("unknown multi-byte short option in string: `%s'", ctx.argv[0]);
>
> which only kicks in in the uncommon case (and extends the error message
> to make it more clear why we are showing the whole string).

Yes. This is IMO a much better approach, and it doesn't involve trying
to figure out what encoding the string is. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]