Replied inline. -- Anthony Ramine Le 29 mai 2013 à 15:52, Duy Nguyen a écrit : > On Wed, May 29, 2013 at 8:37 PM, Anthony Ramine <n.oxyde@xxxxxxxxx> wrote: >> Le 29 mai 2013 à 15:22, Duy Nguyen a écrit : >> >>> On Tue, May 28, 2013 at 8:58 PM, Anthony Ramine <n.oxyde@xxxxxxxxx> wrote: >>>> Case folding is not done correctly when matching against the [:upper:] >>>> character class and uppercased character ranges (e.g. A-Z). >>>> Specifically, an uppercase letter fails to match against any of them >>>> when case folding is requested because plain characters in the pattern >>>> and the whole string and preemptively lowercased to handle the base case >>>> fast. >>> >>> I did a little test with glibc fnmatch and also checked the source >>> code. I don't think 'a' matches [:upper:]. So I'm not sure if that's a >>> correct behavior or a bug in glibc. The spec is not clear (I think) on >>> this. I guess we should just assume that 'a' should match '[:upper:]'? >> >> I don't know, in my opinion if case folding is enabled we should say [:upper:], [:lower:] and [:alpha:] are equivalent. >> >> This opinion is shared by GNU Flex [1]: >> >>> • If your scanner is case-insensitive (the ‘-i’ flag), then ‘[:upper:]’ and ‘[:lower:]’ are equivalent to ‘[:alpha:]’. >> >> [1] http://flex.sourceforge.net/manual/Patterns.html > > Then we should do it too because of this precedent, I think. > >>>> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >>>> } >>>> if (t_ch <= p_ch && t_ch >= prev_ch) >>>> matched = 1; >>>> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { >>>> + uchar t_ch_upper = toupper(t_ch); >>>> + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) >>>> + matched = 1; >>>> + } >>> >>> Or we could stick with to tolower. Something like this >>> >>> if ((t_ch <= p_ch && t_ch >= prev_ch) || >>> ((flags & WM_CASEFOLD) && >>> t_ch <= tolower(p_ch) && t_ch >= tolower(prev_ch))) >>> match = 1; >>> >>> I think it's easier to read if we either downcase all, or upcase all, not both. >> >> If the range to match against is [A-_], it will become [a-_] which is an empty range, ord('a') > ord('_'). I think it is simpler to reuse toupper() after the fact as I did. >> >> Anyway maybe I should add a test for that corner case? > > Yeah I was thinking about such a case, but I saw glibc do it... I > guess we just found another bug, at least in compat/fnmatch.c. Yes a > test for it would be great, in case I change my mind 2 years from now > and decide to turn it the other way ;) Should I patch compat/fnmatch.c too? That would make it different from the glibc's one. >> >>>> p_ch = 0; /* This makes "prev_ch" get set to 0. */ >>>> } else if (p_ch == '[' && p[1] == ':') { >>>> const uchar *s; >>>> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >>>> } else if (CC_EQ(s,i, "upper")) { >>>> if (ISUPPER(t_ch)) >>>> matched = 1; >>>> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) >>>> + matched = 1; >>>> } else if (CC_EQ(s,i, "xdigit")) { >>>> if (ISXDIGIT(t_ch)) >>>> matched = 1; >>> >>> If WM_CASEFOLD is set, maybe isalpha(t_ch) is enough then? >> >> Yes isalpha() is enought but I wanted to keep the two cases separated, I can amend that if you want. > > Either way is fine. I don't think this code is performance critical. Your call. > -- > Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html