Re: dash bug: double-quoted "\" breaks glob protection for next char

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 2, 2018 at 7:03 PM, Harald van Dijk <harald@xxxxxxxxxxx> wrote:
> On 02/03/2018 18:00, Denys Vlasenko wrote:
>>
>> On Wed, Feb 14, 2018 at 9:03 PM, Harald van Dijk <harald@xxxxxxxxxxx>
>> wrote:
>>>
>>> Currently:
>>>
>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
>>> <>
>>>
>>> This is what I expect, and also what bash, ksh and posh do.
>>>
>>> With your patch:
>>>
>>> $ dash -c 'foo=a; echo "<${foo#[a\]]}>"'
>>> <a>
>>
>> I was looking into this specific example and I believe it is a _bash_ bug.
>>
>> The [a\]] is misinterpreted by it (and probably by many people).
>> The gist is: \] is not a valid escape for ] in set glob expression.
>> Glob sets have no escaping at all, ] can be in a set
>> if it is the first char: []abc],
>> dash can be in a set if it is first or last: [abc-],
>> [ and \ need no protections at all: [a[b\c] is a valid set of 5 chars.
>>
>> Therefore, "[a\]]" glob pattern means "a or \, then ]".
>> Since that does not match "a", the result of ${foo#[a\]]}> should be "a".
>
> Are you sure about this? "Patterns Matching a Single Character"'s first
> paragraph contains "A <backslash> character shall escape the following
> character. The escaping <backslash> shall be discarded." The shell does this
> first.

I have problems with "The shell does this first" statement.

It's useful to view the entire discussion of glob pattern matching
as a discussion of how fnmatch(pattern, string, flags) should behave
(even if a particular shell implementation chose to not use
C library's fnmatch() to implement its globbing).

Otherwise (IOW: if you allow gobbing to depend on shell's quoting),
rules for globbing for different applications will not be consistent.
Which would be bad.

As I see it, shell should massage input according to shell rules
(quote/bkslash removal et al), then use fnmatch() or glob(), or its own
internal implementations of them.

bash seems to not do it. It probably has a "combined" routine
which does both in one step, which allows quote removal
to interfere with globbing. Here's the proof:

$ x='a]'; echo _${x#[a\]]}_
_]_

In the above code, what pattern should be fed to fnmatch(),
assuming shell uses fnmatch() to implement ${x#pattern}?
Pattern should be "[a]]" because by shell rules "\]" in
an unquoted string is "]".

But try this:

$ x='a]'; echo _${x#[a]]}_
__

Here, pattern should be "[a]]" as well - it literally is.

But the results are different!

Evidently, bash does _not_ perform quote removal (more precisely,
backslash removal) on pattern string. Somehow, globbing code
knows \ was there.

(And this globbing code, in my opinion, also misinterprets [a\]]
as "set of 'a' or ']'", but (a) I might be wrong on this, and
(b) this is a bit offtopic, we discuss ${x#pattern} handling here).

To me, it looks that bash behavior is buggy regardless of what \]
means in glob patterns. These two should be equivalent:

x='a]'; echo _${x#[a\]]}_
x='a]'; echo _${x#[a]]}_

because they should use the same pattern for globbing match.

Alternative possibility is that pattern in ${x#pattern} is not handled
by the usual shell rules: backslashes are not removed.
This would be VERY ugly as soon as nested variable expansions are considered.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux