On Mon, Feb 13, 2023 at 03:18:28PM -0500, Jeff King wrote: > On Mon, Feb 13, 2023 at 02:38:08PM +0800, 孟子易 wrote: > > > System: Mac Os (Ventura 13.2) > > Language: Chinese simplified > > Preconditions: > > # git checkout -b 测试-加-增加-加-增加 > > # git symbolic-ref --short HEAD > > Wrong Echo (Current Echo): > > 测试-� > > Correct Echo: > > // I Don't know, may be "测试-加" ? > > Hmm, I can't reproduce here on Linux: > > $ git init > $ git commit --allow-empty -m foo > $ git checkout -b 测试-加-增加-加-增加 > $ git symbolic-ref --short HEAD > 测试-加-增加-加-增加 Neither can I - MacOs pre-Ventura ;-) > > I wonder if it is related to using macOS. The refs are stored as > individual files in the filesystem, and HFS+ will do some unicode > normalization. So I get: > > $ ls .git/refs/heads/ | xxd > 00000000: 6d61 696e 0ae6 b58b e8af 952d e58a a02d main.......-...- > 00000010: e5a2 9ee5 8aa0 2de5 8aa0 2de5 a29e e58a ......-...-..... > 00000020: a00a > > Are your on-disk bytes different? In my case there are the same. Trying to convert from UTF-8 into UTF-8-MAC didn't change anything here. Side note: MacOs Ventura is probably not using HFS+, but apfs, which doesn't do the unicode decomposition on file system level. It would be helpful, to pipe the result into xxd: git symbolic-ref --short HEAD | xxd And then see, if there is any garbling inside or outside of Git ? > > My instinct was that this might be related to the shortening code > treating the names as bytes, rather than characters. But looking at > shorten_unambiguous_ref(), it is really operating at the level of path > components, and should never split a partial string. > > Another possibility: the shortening is done by applying our usual > ref-resolving rules one by one via scanf(). There's an assumption in the > code that the resulting string can never be longer than the input: > > /* buffer for scanf result, at most refname must fit */ > short_name = xstrdup(refname); > > ... > for (i = nr_rules - 1; i > 0 ; --i) { > ... > if (1 != sscanf(refname, scanf_fmts[i], short_name)) > continue; > > Is it possible that this assumption is violated based on some particular > combination of unicode normalization and locale? That seems unlikely to > me, but it wouldn't be the first time I've been surprised by subtle > unicode implications. > > Is it possible for you to run Git in a debugger and check the > intermediate steps happening in refs_shorten_unambiguous_ref()? > > -Peff