On 12/01/2023 15:56, Michael Greenberg wrote:
I've seen this issue in interop with modernish, where dash doesn't
properly support multibyte character encodings. It would be great to
have proper support for more interesting character sets.
Agreed, but personally, I see that as a separate issue. Non-ASCII bytes
in alias names are potentially useful even without multibyte character
support, and multibyte character support is definitely useful regardless
of anything that aliases do.
It's worth looking at how other shells implement this, but a natural
choice is to choose a more sophisticated representation of string, where
control characters are represented out-of-band rather than by using
'unused' values (well, unused by ASCII, but not by
others). Unfortunately, this rope-like representation is not very
convenient to work with.
There are a few shells that store words in the exact form that they
appear in the script. This has interesting consequences: it shows the
same effect, but in an arguably less problematic form. In bosh:
$ alias '\a=echo what'
$ a
a: not found
$ \a
what
Other shells that take this approach restrict alias names so that shell
special characters, including '\', cannot appear, but other characters can.
Although this representation is not without its problems, it would
handle this transparently, and has the arguable benefit of automatically
handling things like
$ cat <<`this is problematic`
hello world
`this is problematic`
as well. This has come up on the list before; this works in bash, ksh,
yash, and zsh, and POSIX places no restrictions on what words can be
used as heredoc delimiters, so I think shells are required to accept
this and it is technically a bug that dash doesn't (as well as several
other shells), even if no one would ever make use of it.
It would require massive changes and is almost certainly not appropriate
for dash.
But we can take inspiration from it and think of a more limited fix, one
that works to handle the alias issue, but nothing else: it should be
possible to change the representation of CTLESC and other control
characters. If their representation is changed to that of characters
that cannot appear in alias names without quoting anyway, e.g. by
changing the value of CTLESC to \ and similarly for other control
characters, the second problem is avoided. By changing the parser to set
quoteflag when processing command substitutions etc., the first problem
is avoided. (This may involve renaming quoteflag, since it would no
longer reflect simply whether any part of the word was quoted. Or a new
variable may be added to track this.) This should be enough to make it work.
Cheers,
Harald van Dijk
Cheers,
Michael
On 2023-01-11 at 02:01:03 AM, Harald van Dijk wrote:
Hi,
Please consider
alias $(printf "\204")="exit 2"
$(:)
echo ok
This is a perfectly valid shell script. The alias command is permitted
to either succeed or fail, and if it succeeds, it defines an alias whose
name is the single byte '\204'. No command by that name is ever
executed, so this script is required to then print 'ok' and exit
successfully.
This is not what happens. Internally, '\204' is the value of CTLBACKQ,
and the word $(:) gets translated to an internal representation of just
that -- coupled with a pointer to the parsed command. Since the word
$(:) contains zero quote characters, it is subjected to alias expansion,
and picks up this alias definition that makes the command expand to
exit 2.
Consider also:
alias $(printf "\201")="echo ok" $(printf "\201\201")="echo bad" &&
eval $(printf "\201")
This should either print "ok", or reject the aliases. Instead, it prints
"bad". This happens because '\201' is the internal representation of
CTLESC, and a literal byte of that value is represented by escaping it
with CTLESC. Therefore, it triggers the expansion of the \201\201 alias.
Supporting alias names containing non-ASCII characters, while not
required by POSIX, seems desirable, and almost all other shells (mksh
being the exception) do appear to support this. I am not yet seeing a
good way of solving this.
Cheers,
Harald van Dijk