Re: [PATCH nd/wildmatch] Correct Git's version of isprint and isspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi.

Am 13.11.2012 11:46, schrieb Nguyễn Thái Ngọc Duy:
> Git's ispace does not include 11 and 12. Git's isprint includes
> control space characters (10-13). According to glibc-2.14.1 on C
> locale on Linux, this is wrong. This patch fixes it.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>
> ---
>  I wrote a small C program to compare the result of all is* functions
>  that Git replaces against the libc version. These are the only ones that
>  differ. Which matches what Jan Schönherr commented.
> 
>  ctype.c           |  6 +++---
>  git-compat-util.h | 11 ++++++-----
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/ctype.c b/ctype.c
> index 0bfebb4..71311a3 100644
> --- a/ctype.c
> +++ b/ctype.c
> @@ -14,11 +14,11 @@ enum {
>  	P = GIT_PATHSPEC_MAGIC, /* other non-alnum, except for ] and } */
>  	X = GIT_CNTRL,
>  	U = GIT_PUNCT,
> -	Z = GIT_CNTRL | GIT_SPACE
> +	Z = GIT_CNTRL_SPACE
>  };
>  
> -const unsigned char sane_ctype[256] = {
> -	X, X, X, X, X, X, X, X, X, Z, Z, X, X, Z, X, X,		/*   0.. 15 */
> +const unsigned int sane_ctype[256] = {
> +	X, X, X, X, X, X, X, X, X, Z, Z, Z, Z, Z, X, X,		/*   0.. 15 */
>  	X, X, X, X, X, X, X, X, X, X, X, X, X, X, X, X,		/*  16.. 31 */
>  	S, P, P, P, R, P, P, P, R, R, G, R, P, P, R, P,		/*  32.. 47 */
>  	D, D, D, D, D, D, D, D, D, D, P, P, P, P, P, G,		/*  48.. 63 */

An alternative to switching from 1-byte to 4-byte values (don't we have
a 2-byte datatype?), would be to free up GIT_CNTRL and simply do:

#define iscntrl(x) ((x) < 0x20)


> diff --git a/git-compat-util.h b/git-compat-util.h
> index 02f48f6..4ed3f94 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
[...]
> @@ -483,9 +483,10 @@ extern const unsigned char sane_ctype[256];
>  #define GIT_PATHSPEC_MAGIC 0x20
>  #define GIT_CNTRL 0x40
>  #define GIT_PUNCT 0x80
> -#define sane_istest(x,mask) ((sane_ctype[(unsigned char)(x)] & (mask)) != 0)
> +#define GIT_SPACE 0x100
> +#define sane_istest(x,mask) ((sane_ctype[(unsigned int)(x)] & (mask)) != 0)

That should better be left "(unsigned char)"? We might access values after the
array otherwise.

(That said, it wasn't really correct before either, when there really is a
possibility that x >= 0x100.)

Regards
Jan

PS: It looks like my isprint() version was given precedence over your
isprint() version during the merge into next. That should also be sorted out,
but I've no idea which one is actually better: two comparisons versus one
cache lookup and a bitop... (though my guess is that comparisons are cheaper,
but then we should also convert isdigit()...)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]