On 11.02.13 17:36, Erik Faye-Lund wrote: > On Mon, Feb 11, 2013 at 5:28 PM, Torsten Bögershausen <tboegi@xxxxxx> wrote: >> On 11.02.13 14:34, Erik Faye-Lund wrote: >>> Even though parse-options doesn't support UTF-8 switches (which >>> makes sense; non-ascii switches would be difficult to enter on >>> some keyboard layouts), it can be useful to report incorrectly >>> entered UTF-8 switches to make the output somewhat less ugly >>> for those of us with keyboard layouts with UTF-8 characters on >>> it. >>> >>> Make the reporting code grok UTF-8 in the option sequence, and >>> write a variable-width output sequence. >>> >>> Signed-off-by: Erik Faye-Lund <kusmabite@xxxxxxxxx> >>> --- >>> As being both clumsy and Norwegian, I some times to enter the >>> Norwegian bizarro-letters ('æ', 'ø' and 'å') instead of the >>> correct ones when entering command-line options. >>> >>> However, since git only looks at one byte at the time for >>> short-options, it ends up reporting a partial UTF-8 sequence >>> in such cases, leading to corruption of the output. >>> >>> The "real fix" would probably be to add proper multi-byte >>> support to the short-option parser, but this serves little >>> purpose in Git; we don't internationalize the command-line >>> switches. >>> >>> So perhaps this is a suitable band-aid instead? >>> >>> parse-options.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/parse-options.c b/parse-options.c >>> index 67e98a6..20dc742 100644 >>> --- a/parse-options.c >>> +++ b/parse-options.c >>> @@ -3,6 +3,7 @@ >>> #include "cache.h" >>> #include "commit.h" >>> #include "color.h" >>> +#include "utf8.h" >>> >>> static int parse_options_usage(struct parse_opt_ctx_t *ctx, >>> const char * const *usagestr, >>> @@ -462,7 +463,9 @@ int parse_options(int argc, const char **argv, const char *prefix, >>> if (ctx.argv[0][1] == '-') { >>> error("unknown option `%s'", ctx.argv[0] + 2); >>> } else { >>> - error("unknown switch `%c'", *ctx.opt); >>> + const char *next = ctx.opt; >>> + utf8_width(&next, NULL); >>> + error("unknown switch `%.*s'", (int)(next - ctx.opt), ctx.opt); >>> } >>> usage_with_options(usagestr, options); >>> } >>> >> Would the following do the trick? >> >> diff --git a/parse-options.c b/parse-options.c >> index c1c66bd..f800552 100644 >> --- a/parse-options.c >> +++ b/parse-options.c >> @@ -471,7 +471,7 @@ int parse_options(int argc, const char **argv, const char *prefix, >> if (ctx.argv[0][1] == '-') { >> error("unknown option `%s'", ctx.argv[0] + 2); >> } else { >> - error("unknown switch `%c'", *ctx.opt); >> + error("unknown switch `%s'", ctx.opt); >> } >> >> > Nope; that would print the rest of the option-string, in cases of "git > <command> -abcd". Ok, may be pick_one_utf8_char() is a better choice than simply assuming ASCII. We can make a guess, if it is utf-8, we use it. If not, assume ASCII. Just thinking loud (the "if" could be written shorter using the "?" operator) } else { const char *start = ctx.opt; unsigned c = pick_one_utf8_char(&start, NULL); if (!c) c = *ctx.opt; error("unknown switch `%c'", c); } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html