Re: [PATH/RFC] parse-options: report invalid UTF-8 switches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 11, 2013 at 2:43 PM, Matthieu Moy
<Matthieu.Moy@xxxxxxxxxxxxxxx> wrote:
> Erik Faye-Lund <kusmabite@xxxxxxxxx> writes:
>
>> --- a/parse-options.c
>> +++ b/parse-options.c
>> @@ -3,6 +3,7 @@
>>  #include "cache.h"
>>  #include "commit.h"
>>  #include "color.h"
>> +#include "utf8.h"
>>
>>  static int parse_options_usage(struct parse_opt_ctx_t *ctx,
>>                              const char * const *usagestr,
>> @@ -462,7 +463,9 @@ int parse_options(int argc, const char **argv, const char *prefix,
>>               if (ctx.argv[0][1] == '-') {
>>                       error("unknown option `%s'", ctx.argv[0] + 2);
>>               } else {
>> -                     error("unknown switch `%c'", *ctx.opt);
>> +                     const char *next = ctx.opt;
>> +                     utf8_width(&next, NULL);
>> +                     error("unknown switch `%.*s'", (int)(next - ctx.opt), ctx.opt);
>>               }
>>               usage_with_options(usagestr, options);
>>       }
>
> You should be careful with the case where the user has a non-UTF8
> environment, and entered a non-ascii sequence. I can see two cases:
>
> 1) The non-ascii sequence is valid UTF-8, then I guess your patch would
>    show two characters instead of one. Not really correct, but not really
>    serious either.

Hm. So we would end up trading some form of corruption for some other.
Not the biggest problem in the world, but perhaps there's a way of
fixing it?

I'm not entirely sure how to correctly know what encoding stdin is
supposed to be. On Windows, that's easy; it's UTF-16, we re-encode it
to UTF-8 on startup in Git for Windows. But on other platforms, I have
no clue.

But isn't UTF-8 constructed to be very unlikely to clash with existing
encodings? If so, I could add a case for non-ascii and non-UTF-8, that
simply writes the byte as a hex-tuple?

> 2) The non-ascii sequence is NOT valid UTF-8, then if I read correctly
>    (I didn't test) utf8_width would set next to NULL, and then you are
>    in big trouble.

Outch. Yeah, you are right; this is not good at all :)

But I guess the solution above should fix this as well, no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]