Current isprint() incorrectly includes control characters 9, 10 and 13, which is fixed by this patch. Current isspace() lacks 11 and 12. But Git's isspace() has been designed this way since the beginning and has over 100 call sites relying on this. Instead of updating isspace() behavior (which could be tricky as patches from other topics may come in parallel that assume the old isspace()), a new isspace_posix() is introduced and used by wildmatch.c. Other part of Git can be converted to use this new function if it seems appropriate. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx> --- Sorry for the late response. I'll reply to everybody in one mail. On Wed, Nov 14, 2012 at 1:58 AM, "Jan H. Schönherr" <schnhrr@xxxxxxxxxxxxxxx> wrote: > An alternative to switching from 1-byte to 4-byte values (don't we have > a 2-byte datatype?), would be to free up GIT_CNTRL and simply do: > > #define iscntrl(x) ((x) < 0x20) No. 127 is also a control character. On Wed, Nov 14, 2012 at 2:41 AM, Johannes Sixt <j6t@xxxxxxxx> wrote: > So we have two properties that overlap: > > SSSSSSSSSS > CCCCCCCC > > You seem to generate partions: > > XXXYYYYYZZZZZ > > then assign individual bits to each partition. Now each entry in the > lookup table has only one bit set. Then you define isxxx() to check for > one of the two possible bits: > > iscntrl is X or Y > isspace is Y or Z > > But shouldn't you just assign one bit for S and another one for C, have > entries in the lookup table with more than one bit set, and check for > only one bit in the isxxx macro? > > That way you don't run out of bits as easily as you do with this patch. I need three sets of characters actually: control, spaces and printable (which contains non-control spaces). Making it (isspace(x) && (x) >= 32) is simpler and because isprint() is only used in wildmatch, I don't need to think about performance penalty (yet). On Thu, Nov 15, 2012 at 2:30 AM, René Scharfe <rene.scharfe@xxxxxxxxxxxxxx> wrote: > Nevertheless, it's unfortunate that we have an isspace() that *almost* does > what the widely known thing of the same name does. I'd shy away from > changing git's version directly, because it's used more than a hundred times > in the code, and estimating the impact of adding \v and \f to it. > Perhaps renaming it to isgitspace() is a good first step, followed by > adding a "standard" version of isspace() for wildmatch? There are just too many call sites of isspace() and there is a risk of new call sites coming in independently. So I think keeping isspace() as-is and using a different name for the standard version is probably a better choice. As the new isspace_posix() is only used by wildmatch, its performance as of now is not critical and a simple macro like in this patch is probably enough. We can optimize it later if we need to. git-compat-util.h | 4 +++- wildmatch.c | 8 ++------ 2 files changed, 5 insertions(+), 7 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index 02f48f6..d4c3fda 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -486,6 +486,7 @@ extern const unsigned char sane_ctype[256]; #define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0) #define isascii(x) (((x) & ~0x7f) == 0) #define isspace(x) sane_istest(x,GIT_SPACE) +#define isspace_posix(x) (((x) >= 9 && (x) <= 13) || (x) == 32) #define isdigit(x) sane_istest(x,GIT_DIGIT) #define isalpha(x) sane_istest(x,GIT_ALPHA) #define isalnum(x) sane_istest(x,GIT_ALPHA | GIT_DIGIT) @@ -499,7 +500,8 @@ extern const unsigned char sane_ctype[256]; #define isxdigit(x) (hexval_table[x] != -1) #define isprint(x) (sane_istest(x, GIT_ALPHA | GIT_DIGIT | GIT_SPACE | \ GIT_PUNCT | GIT_REGEX_SPECIAL | GIT_GLOB_SPECIAL | \ - GIT_PATHSPEC_MAGIC)) + GIT_PATHSPEC_MAGIC) && \ + (x) >= 32) #define tolower(x) sane_case((unsigned char)(x), 0x20) #define toupper(x) sane_case((unsigned char)(x), 0) #define is_pathspec_magic(x) sane_istest(x,GIT_PATHSPEC_MAGIC) diff --git a/wildmatch.c b/wildmatch.c index 3972e26..fd74efd 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -37,11 +37,7 @@ typedef unsigned char uchar; # define ISBLANK(c) ((c) == ' ' || (c) == '\t') #endif -#ifdef isgraph -# define ISGRAPH(c) (ISASCII(c) && isgraph(c)) -#else -# define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace(c)) -#endif +#define ISGRAPH(c) (ISASCII(c) && isprint(c) && !isspace_posix(c)) #define ISPRINT(c) (ISASCII(c) && isprint(c)) #define ISDIGIT(c) (ISASCII(c) && isdigit(c)) @@ -50,7 +46,7 @@ typedef unsigned char uchar; #define ISCNTRL(c) (ISASCII(c) && iscntrl(c)) #define ISLOWER(c) (ISASCII(c) && islower(c)) #define ISPUNCT(c) (ISASCII(c) && ispunct(c)) -#define ISSPACE(c) (ISASCII(c) && isspace(c)) +#define ISSPACE(c) (ISASCII(c) && isspace_posix(c)) #define ISUPPER(c) (ISASCII(c) && isupper(c)) #define ISXDIGIT(c) (ISASCII(c) && isxdigit(c)) -- 1.8.0.rc2.23.g1fb49df -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html