Hello Junio, Replied inline. Regards, -- Anthony Ramine Le 2 juin 2013 à 23:53, Junio C Hamano a écrit : > Anthony Ramine <n.oxyde@xxxxxxxxx> writes: > >> ase folding is not done correctly when matching against the [:upper:] >> character class and uppercased character ranges (e.g. A-Z). >> Specifically, an uppercase letter fails to match against any of them >> when case folding is requested because plain characters in the pattern >> and the whole string are preemptively lowercased to handle the base case >> fast. >> >> That optimization is kept and ISLOWER() is used in the [:upper:] case >> when case folding is requested, while matching against a character range >> is retried with toupper() if the character was lowercase, as the bounds >> of the range itself cannot be modified (in a case-insensitive context, >> [A-_] is not equivalent to [a-_]). >> >> Signed-off-by: Anthony Ramine <n.oxyde@xxxxxxxxx> >> Reviewed-by: Duy Nguyen <pclouds@xxxxxxxxx> >> --- > > Thanks. > >> t/t3070-wildmatch.sh | 55 ++++++++++++++++++++++++++++++++++++++++++++++------ >> wildmatch.c | 7 +++++++ >> 2 files changed, 56 insertions(+), 6 deletions(-) >> >> I added Duy as reviewer and fixed a typo in the commit message reported by >> Eric Sunshine. >> >> diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh >> index 4c37057..38446a0 100755 >> --- a/t/t3070-wildmatch.sh >> +++ b/t/t3070-wildmatch.sh >> @@ -6,20 +6,20 @@ test_description='wildmatch tests' >> >> match() { >> if [ $1 = 1 ]; then >> - test_expect_success "wildmatch: match '$3' '$4'" " >> + test_expect_success "wildmatch: match '$3' '$4'" " >> test-wildmatch wildmatch '$3' '$4' >> " >> else >> - test_expect_success "wildmatch: no match '$3' '$4'" " >> + test_expect_success "wildmatch: no match '$3' '$4'" " >> ! test-wildmatch wildmatch '$3' '$4' >> " >> fi >> if [ $2 = 1 ]; then >> - test_expect_success "fnmatch: match '$3' '$4'" " >> + test_expect_success "fnmatch: match '$3' '$4'" " >> test-wildmatch fnmatch '$3' '$4' >> " >> elif [ $2 = 0 ]; then >> - test_expect_success "fnmatch: no match '$3' '$4'" " >> + test_expect_success "fnmatch: no match '$3' '$4'" " >> ! test-wildmatch fnmatch '$3' '$4' >> " >> # else > > Is the above about aligning $3/$4 across checks of different types > (i.e. purely cosmetic)? I am not complaining; just making sure if > there is nothing deeper going on. Yes purely cosmetic. > It is outside the scope of this change, but the shell style of this > script (most notably use of [] instead of test) needs to be fixed > someday, preferrably soon, including the commented out else clause > at the end of the hunk. > >> @@ -235,4 +247,35 @@ pathmatch 1 abcXdefXghi '*X*i' >> pathmatch 1 ab/cXd/efXg/hi '*/*X*/*/*i' >> pathmatch 1 ab/cXd/efXg/hi '*Xg*i' >> >> +# Case-sensitivy features >> +match 0 x 'a' '[A-Z]' >> +match 1 x 'A' '[A-Z]' >> +match 0 x 'A' '[a-z]' >> +match 1 x 'a' '[a-z]' >> +match 0 x 'a' '[[:upper:]]' >> +match 1 x 'A' '[[:upper:]]' >> +match 0 x 'A' '[[:lower:]]' >> +match 1 x 'a' '[[:lower:]]' >> +match 0 x 'A' '[B-Za]' >> +match 1 x 'a' '[B-Za]' >> +match 0 x 'A' '[B-a]' >> +match 1 x 'a' '[B-a]' >> +match 0 x 'z' '[Z-y]' >> +match 1 x 'Z' '[Z-y]' >> + >> +imatch 1 'a' '[A-Z]' > > Do we want "# Case-insensitivity features" commment here as well? I don't particularly care, some sections have titles, some don't. >> +imatch 1 'A' '[A-Z]' >> +imatch 1 'A' '[a-z]' >> +imatch 1 'a' '[a-z]' >> +imatch 1 'a' '[[:upper:]]' >> +imatch 1 'A' '[[:upper:]]' >> +imatch 1 'A' '[[:lower:]]' >> +imatch 1 'a' '[[:lower:]]' >> +imatch 1 'A' '[B-Za]' >> +imatch 1 'a' '[B-Za]' >> +imatch 1 'A' '[B-a]' >> +imatch 1 'a' '[B-a]' >> +imatch 1 'z' '[Z-y]' >> +imatch 1 'Z' '[Z-y]' > >> + >> test_done >> diff --git a/wildmatch.c b/wildmatch.c >> index 7192bdc..f91ba99 100644 >> --- a/wildmatch.c >> +++ b/wildmatch.c >> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >> } >> if (t_ch <= p_ch && t_ch >= prev_ch) >> matched = 1; >> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) { >> + uchar t_ch_upper = toupper(t_ch); >> + if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch) >> + matched = 1; >> + } >> p_ch = 0; /* This makes "prev_ch" get set to 0. */ > > Hmm, this looks somewhat strange. > > * At the beginning of the outermost "per characters in the text" > loop, we seem to downcase t_ch when WM_CASEFOLD is set. > * Also at the same place, we also seem to downcase p_ch under the > same condition. > > which makes me wonder why the fix is not like this: > > + if (flags & WM_CASEFOLD) { > + if (ISUPPER(p_ch)) > + p_ch = tolower(p_ch); > + if (prev_ch && ISUPPER(prev_ch)) > + prev_ch = tolower(prev_ch); > + } > if (t_ch <= p_ch && t_ch >= prev_ch) > matched = 1; > p_ch = 0; /* This sets "prev_ch" to 0 */ > > > Ahh, OK, the "seemingly strange" construct is about handling a range > like "[Z-y]"; we do not want to upcase or downcase the p_ch/prev_ch > make the range "[z-y]" (empty) which would exclude bytes like "^", > "_" or even "Z". > > And it is also OK to downcase p_ch in a single-letter case, not the > characters in a range, at the beginning of the outermost loop, > because we always compare for equality against t_ch (which is > downcased) in that case. Yeah I tried to explain that in the commit message but it is surely too dense. >> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) >> } else if (CC_EQ(s,i, "upper")) { >> if (ISUPPER(t_ch)) >> matched = 1; >> + else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) >> + matched = 1; > > This also looks somewhat strange but correct in that t_ch is already > downcased so we do not need a corresponding change for CC_EQ("lower") > codepath. I chose to still lowercase everything first, to keep the common path as is. > Interesting. Will apply. > > Thanks. Great. You're welcome. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html