Hey. On Sat, 2024-04-27 at 19:03 +0800, Herbert Xu wrote: > This patch series adds multi-byte support to dash. For now only > fnmatch is supported as the native pmatch function has not been > modified to support multi-byte characters. Nothing against the functionality per se, but I think for all scripts that assumed dash's (and thus on may systems /bin/sh's) current behaviour of being C locale only even without explicitly setting LC_ALL=C, this may have quite some subtle issues. AFAIU, in the C locale, all bytes is a character, and thus in particular pattern matching notation is defined for every defined outcome of command substitution respectively every content of variables (that is: in every(!) locale every byte other than NUL). For example: ************ A while ago I've asked on the Austin Group mailing list for a portable way to get command substitution without stripping of trailing newlines. Long story short: The recommended way was to add a sentinel character '.' at the end of the output within the command substitution and strip that off later with parameter expansion. But despite of the very special properties[0] of '.', it's apparently still required to set LC_ALL=C when stripping the sentinel, because the pattern matching notation in ${foo%.} is defined only on strings of characters, not on strings of bytes. Back then, Harald van Dijk had some ideas how that might be resolved for good, but IIRC none of the shell implementors seemed to really have interest. My goal was to make a portable function like command_subst_with_newlines "eval-ed-command-string" "target-variable-name" which, with the requirement of setting LC_ALL proved more or less impossible when the function should have no side effects (like keeping the LC_ALL overridden, over possibly overriding some existing var like OLD_LC_ALL). Anyway... I could image, that if dash becomes multi-byte aware, there might be more or less subtle surprises. Cheers, Chris. [0] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html "The encoded values associated with <period>, <slash>, <newline>, and <carriage-return> shall be invariant across all locales supported by the implementation."