Re: dash bug: double-quoted "\" breaks glob protection for next char

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/4/18 7:05 PM, Denys Vlasenko wrote:
On Fri, Mar 2, 2018 at 7:03 PM, Harald van Dijk <harald@xxxxxxxxxxx> wrote:
On 02/03/2018 18:00, Denys Vlasenko wrote:

On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk <harald@xxxxxxxxxxx>
wrote:

Currently:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<>

This is what I expect, and also what bash, ksh and posh do.

With your patch:

$ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
<a>

I was looking into this specific example and I believe it is a _bash_ bug.

The [a\]] is misinterpreted by it (and probably by many people).
The gist is: \] is not a valid escape for ] in set glob expression.
Glob sets have no escaping at all, ] can be in a set
if it is the first char: []abc],
dash can be in a set if it is first or last: [abc-],
[ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.

Therefore, "[a\]]" glob pattern means "a or \, then ]".
Since that does not match "a", the result of ${foo#[a\]]}> should be "a".

Are you sure about this? "Patterns Matching a Single Character"'s first
paragraph contains "A <backslash> character shall escape the following
character. The escaping <backslash> shall be discarded." The shell does this
first.

I have problems with "The shell does this first" statement.

It's useful to view the entire discussion of glob pattern matching
as a discussion of how fnmatch(pattern, string, flags) should behave
(even if a particular shell implementation chose to not use
C library's fnmatch() to implement its globbing).

Okay. dash has an option to use fnmatch() for pattern matching.

But POSIX says fnmatch() performs pattern matching the way the shell does, not the other way around. And there are a few differences that follow from it, because character quote status cannot be represented the same way in a C string as in the description of the shell.

Otherwise (IOW: if you allow gobbing to depend on shell's quoting),
rules for globbing for different applications will not be consistent.
Which would be bad.

That's already the case, and cannot be helped. The shell pattern "[a]"[a] is supposed to match a file named "[a]a", and does. But if that file exists, and I run find . -name '"[a]"[a]', I won't get any results. I could use find . -name '\[a\][a]' for that though. And dash, if built with fnmatch() support, translates it to that in order to pass it to fnmatch().

This matches what POSIX says: POSIX doesn't make the removal of " part of pattern matching, and find -name and fnmatch() *only* implement the pattern matching part of the shell.

As I see it, shell should massage input according to shell rules
(quote/bkslash removal et al), then use fnmatch() or glob(), or its own
internal implementations of them.

bash seems to not do it. It probably has a "combined" routine
which does both in one step, which allows quote removal
to interfere with globbing. Here's the proof:

$ x='a]'; echo _${x#[a\]]}_
_]_

In the above code, what pattern should be fed to fnmatch(),
assuming shell uses fnmatch() to implement ${x#pattern}?
Pattern should be "[a]]" because by shell rules "\]" in
an unquoted string is "]".

By the normal shell rules, there are two places backslashes get removed. The first is during pattern matching. Not before, during.

If fnmatch() is used to implement shell pathname expansion and other pattern matching, the string to pass here is literally [a\]]. Just like how for \*, the string to pass is literally \*, and it would be horribly wrong to pass * here as if it were unquoted. fnmatch() will remove the backslashes and take the following character literally.

The second place where the shell removes backslashes is during quote removal. But this takes place after pathname expansion.

But try this:

$ x='a]'; echo _${x#[a]]}_
__

Here, pattern should be "[a]]" as well - it literally is.

Here, the pattern should indeed be [a]].

But the results are different!

Evidently, bash does _not_ perform quote removal (more precisely,
backslash removal) on pattern string. Somehow, globbing code
knows \ was there.

(And this globbing code, in my opinion, also misinterprets [a\]]
as "set of 'a' or ']'", but (a) I might be wrong on this, and
(b) this is a bit offtopic, we discuss ${x#pattern} handling here).

To me, it looks that bash behavior is buggy regardless of what \]
means in glob patterns. These two should be equivalent:

x='a]'; echo _${x#[a\]]}_
x='a]'; echo _${x#[a]]}_

because they should use the same pattern for globbing match.

See above.

Alternative possibility is that pattern in ${x#pattern} is not handled
by the usual shell rules: backslashes are not removed.
This would be VERY ugly as soon as nested variable expansions are considered.

There are and there are supposed to be a few differences in how ${x#pattern} is treated vs. pattern, described in Patterns Used for Filename Expansion. But backslash handling is not one of those differences.

Cheers,
Harald van Dijk
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux