Re: dash bug: double-quoted "\" breaks glob protection for next char

Harald van Dijk <harald@xxxxxxxxxxx> · Fri, 2 Mar 2018 11:58:41 +0100

On 02/03/2018 08:49, Herbert Xu wrote:
On Thu, Mar 01, 2018 at 08:24:22PM +0100, Harald van Dijk wrote:
On 01/03/2018 00:04, Harald van Dijk wrote:
$ bash -c 'x=yz; echo "${x#'"'y'"'}"'
z

$ dash -c 'x=yz; echo "${x#'"'y'"'}"'
yz

(That is, they are executing x=yz; echo "${x#'y'}".)

POSIX says that in "${var#pattern}" (and the same for ##, % and %%), the
pattern is considered unquoted regardless of the outer quotation marks.
Because of that, the single quote characters should not be taken
literally, but should be taken as quoting the y. ksh, posh and zsh agree
with bash.

Unfortunately, this causes another problem with all of the backslash
approaches so far:

   x='\\\\'; printf "%s\n" "${x#'\\\\'}"

This should print a blank line. (bash, ksh, posh and zsh agree.)

Here, dash's parser stores '$\$\', where $ is a control character. preglob
would need to turn this into \\\\\\\\. The problem is again that preglob
cannot increase the string length. Perhaps the parser needs to store this as
'$\$\$\$\', $ being either CTLESC or that new CTLBACK? Either way, it
requires some more invasive changes.

These are different issues.  dash's parser currently does not
understand nested quoting in patterns at all.  That is, if your
parameter expansion are within double quotes, then dash at the
parser level will consider the pattern to be double-quoted.  Thus
any nested single-quotes will be literals instead of actual quotes.

That's the same thing though. The problem with the backslashes is also 
that dash sees them as double-quoted when they should be seen as 
unquoted, and the approach taken in commit 
7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c that lasts to this day was 
specifically to *not* fix this in the parser, but to simply have the 
parser record enough information so that quote status can be determined 
and patched up during expansion. It's just that in the case of single 
quotes, expansion was never modified to recognise them. Thinking some 
more, I don't think the parser actually records enough information to 
let that work.

If we fix this in the parser then everything should just work.

Right, that's the approach FreeBSD sh has taken that I referred to in my 
message from Feb 18, that I'd personally prefer as well. It basically 
involves reverting 7cfd8be0dc83342b4a71f3a8e5b7efab4670e50c, setting 
syntax to BASESYNTAX/DQSYNTAX (whichever is appropriate) when the parse 
of a variable expansion starts, and finding a sensible way to change the 
syntax back to BASESYNTAX/DQSYNTAX/ARISYNTAX when it ends. In FreeBSD 
sh, an explicit stack of syntaxes is created for this, but that might be 
avoidable: with slight modifications to what gets stored in the byte 
after CTLVAR/CTLARI, it might be possible to go back through the parser 
output to determine the syntax to revert to. I'll see if I can get that 
working.

Cheers,
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html