Re: Backslashes in unquoted parameter expansions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Op 25-03-18 om 22:56 schreef Harald van Dijk:
>   case /dev in $pat) echo why ;; esac
> 
> Now, bash and dash say that the pattern does match -- they take the
> backslash as unquoted, allowing it to escape the v. Most other shells
> (bosh, ksh93, mksh, pdksh, posh, yash, zsh) still take the backslash as
> quoted.
> 
> This doesn't make sense to me, and doesn't match historic practice:
[...]

POSIX says:
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_05
| In order from the beginning to the end of the case statement, each
| pattern that labels a compound-list shall be subjected to tilde
| expansion, parameter expansion, command substitution, and arithmetic
| expansion, and the result of these expansions shall be compared
| against the expansion of word, according to the rules described in
| Pattern Matching Notation (which also describes the effect of quoting
| parts of the pattern).

The way I read this, this clearly says that quoting in a pattern
(particularly backslash quoting, which is the only kind specified in
"Pattern Matching Notation") still needs to have the usual effect even
if the pattern results from one or more expansions. But I understand
there are differences of opinion about this. <shrug>

It's certainly true that few shells actually act this way, but dash is
one that does, as is Busybox ash -- and so is bash (for the most part;
see further on).

I think *not* acting this way is illogical. Why should 'case' parse glob
characters resulting from expansions, but not the backslashes that could
quote those glob characters? I can see no reason for that.

Note that quoting the expansion, as in
    case /dev in "$pat") echo why ;; esac
does what you would expect: the pattern resulting from the expansion is
fully quoted. So with dash and bash you can easily and cleanly have it
either way, unlike with other shells.

(Note that yash, ksh93 and zsh-as-sh act half-baked: backslashes in
patterns resulting from expansions are accepted to quote glob characters
and backslashes themselves, but not any other character. AFAICT, that
behaviour doesn't conform to POSIX no matter which way you slice it.)

[...]
> or are there scenarios where it's important to treat an expanded
> backslash as unquoted?

Consider this function from modernish (simplified version):

match() {
	case $1 in
	( $2 ) ;;
	( * ) return 1 ;;
	esac
}

This allows doing:

	if match STRING GLOBPATTERN; then

on every POSIX shell. Very convenient. Easier than 'case', especially if
you want to combine it like: command1 && match foo bar && command3, etc.
And the syntax is not an eyesore, finger-twister and spacing pitfall,
unlike that of '[['.

But consider this:

	match 'a\bcd' 'a\?c*'

The '?' is escaped so shouldn't match. This correctly returns a negative
on dash, bash, ksh93, and zsh. It returns a false positive on yash and
mksh. (I haven't tested other shells like FreeBSD sh lately.) This means
on those shells you can't use a backslash to escape a glob character in
a pattern passed as a parameter.

And how about this:

	match 'a\bcd' 'a\\bcd'

Same pattern as above. This correctly returns a positive on dash, bash,
ksh93, and zsh-as-sh; a false negative on the rest.

However, this:

	match '? *xy' '??\*\x\y'

only correctly return a positive on bash and dash. That's because ksh93
and zsh-as-sh, for patterns resulting from expansions, only parse
backslash quoting for glob characters and the backslash itself, but not
for other characters.

On bash, there is a bug that breaks backslash quoting on match() if the
pattern contains a ^A (\001). So bash can't robustly use the simple
match() for arbitrary patterns. This is *mostly* fixed in the
development version; the fix is good enough for the simple match() to work.

Bottom line is, dash and Busybox ash (but not FreeBSD sh), as well as
the upcoming release version of bash, are currently the only shells that
can reliably use the plain, simple and fast match() above for arbitrary
patterns.

When running on other shells (as determined by an init-time feature test
using a simple match()), modernish match() detects one or more
backslashes in the pattern, and if it finds any, quotes the pattern
except for glob characters and backslashes, so it can safely be
'eval'-ed as a literal pattern. This workaround is effective, but was a
bitch to get right and is not exactly a performance winner.

So yeah, I'd like to keep dash the way it is, please :)

- Martijn
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux