Re: bug report: symbolic-ref --short command echos the wrong text while use Chinese language

Jeff King <peff@xxxxxxxx> · Mon, 13 Feb 2023 15:18:28 -0500

On Mon, Feb 13, 2023 at 02:38:08PM +0800, 孟子易 wrote:

> System: Mac Os (Ventura 13.2)
> Language: Chinese simplified
> Preconditions:
> # git checkout -b 测试-加-增加-加-增加
> # git symbolic-ref --short HEAD
> Wrong Echo (Current Echo):
> 测试-�
> Correct Echo:
> // I Don't know, may be "测试-加" ?

Hmm, I can't reproduce here on Linux:

  $ git init
  $ git commit --allow-empty -m foo
  $ git checkout -b 测试-加-增加-加-增加
  $ git symbolic-ref --short HEAD
  测试-加-增加-加-增加

I wonder if it is related to using macOS. The refs are stored as
individual files in the filesystem, and HFS+ will do some unicode
normalization. So I get:

  $ ls .git/refs/heads/ | xxd
  00000000: 6d61 696e 0ae6 b58b e8af 952d e58a a02d  main.......-...-
  00000010: e5a2 9ee5 8aa0 2de5 8aa0 2de5 a29e e58a  ......-...-.....
  00000020: a00a          

Are your on-disk bytes different?

My instinct was that this might be related to the shortening code
treating the names as bytes, rather than characters. But looking at
shorten_unambiguous_ref(), it is really operating at the level of path
components, and should never split a partial string.

Another possibility: the shortening is done by applying our usual
ref-resolving rules one by one via scanf(). There's an assumption in the
code that the resulting string can never be longer than the input:

	/* buffer for scanf result, at most refname must fit */
	short_name = xstrdup(refname);

	...
        for (i = nr_rules - 1; i > 0 ; --i) {
		...
                if (1 != sscanf(refname, scanf_fmts[i], short_name))
                        continue;

Is it possible that this assumption is violated based on some particular
combination of unicode normalization and locale? That seems unlikely to
me, but it wouldn't be the first time I've been surprised by subtle
unicode implications.

Is it possible for you to run Git in a debugger and check the
intermediate steps happening in refs_shorten_unambiguous_ref()?

-Peff