Re: [PATCH 0/8] Add multi-byte support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28/04/2024 01:49, Herbert Xu wrote:
On Sat, Apr 27, 2024 at 11:31:43PM +0200, Christoph Anton Mitterer wrote:

Long story short:
The recommended way was to add a sentinel character '.' at the end of
the output within the command substitution and strip that off later
with parameter expansion.
But despite of the very special properties[0] of '.', it's apparently
still required to set LC_ALL=C when stripping the sentinel, because the
pattern matching notation in ${foo%.} is defined only on strings of
characters, not on strings of bytes.

Are you talking about a theoretical undefined condition, or an
actual one?  Which shell doesn't deal with ${foo%.} correctly?

The way you are implementing it, once you get to pmatch(), arguably you will not handle ${foo%.} correctly.

Consider an UTF-8 locale, where '\303' is not a valid multibyte character. In this locale, consider

  foo=$(printf '\303.')
  foo=${foo%.}

This is something I expect to set foo to '\303', and it does in all shells I know of, despite POSIX not saying this needs to work. The way you are implementing multibyte character support, if I am reading it right, as long as a full multibyte character has not been read, the next byte will be taken as part of that multibyte character, meaning you will take '\303.' as a single invalid multibyte character.

At the same time, '\303\251' is a valid multibyte character, and '\251' is not. So also consider

  foo=$(printf '\303\251')
  foo=${foo%$(printf '\251')}

Here, it is not clear what the correct result is, and indeed, shells disagree. bosh, ksh, zsh, and my shell do not break up characters, which I believe to be the most sensible behaviour. bash and mksh do.

The corner cases need to be carefully considered in order to figure out how to write the multibyte character support core functionality.

Cheers,
Harald van Dijk




[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux