Re: [PATCH v2] grep: fix multibyte regex handling under macOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Diomidis Spinellis via GitGitGadget" <gitgitgadget@xxxxxxxxx>
writes:

> From: Diomidis Spinellis <dds@xxxxxxx>
>
> The 2013 commit 29de20504e9 fixed t0070-fundamental.sh under Darwin
> (macOS) by adopting Git's regex library.  However, this library is
> compiled with NO_MBSUPPORT, which causes git-grep to work incorrectly
> on multibyte (e.g. UTF-8) files.  Current macOS versions pass
> t0070-fundamental.sh with the native macOS regex library, which
> also supports multibyte characters.

Very nice to see the last sentence in that paragraph.

> Signed-off-by: Diomidis Spinellis <dds@xxxxxxx>
> ---

This part, from here ...

>     grep: fix multibyte regex handling under macOS
>     
>     The 2013 commit 29de20504e9 fixed t0070-fundamental.sh under Darwin
>     (macOS) by adopting Git's regex library. However, this library is
>     compiled with NO_MBSUPPORT, which causes git-grep to work incorrectly on
>     multibyte (e.g. UTF-8) files. Current macOS versions pass
>     t0070-fundamental.sh with the native macOS regex library, which also
>     supports multibyte characters.
>     
>     Adjust the Makefile to use the native regex library, and call
>     setlocale(3) to set CTYPE according to the user's preference. The
>     setlocale(3) call is required on all platforms, but in platforms
>     supporting gettext(3), setlocale(3) is called as a side-effect of
>     initializing gettext(3).
>     
>     To avoid running the new tests on platforms still using the
>     compatibility library, which is compiled without multibyte support,
>     store the corresponding NO_REGEX setting in the GIT-BUILD-OPTIONS file.
>     This makes it available to the test scripts. In addition, adjust the
>     test-tool regex command to work with multibyte regexes to further test a
>     platform's support for them.
>     
>     Signed-off-by: Diomidis Spinellis dds@xxxxxxx

... up to here does not need a separate sign-off.  It is primarily
to describe what got changed between v1 and this version.  The
changes can be seen in range-diff, and in some cases what the
range-diff shows is sufficient to understand the difference, but it
is better to either (1) supply explanation in your own words, or (2)
omit almost duplicate description.

> diff --git a/Makefile b/Makefile
> index 04d0fd1fe60..d1a98257150 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1427,7 +1427,6 @@ ifeq ($(uname_S),Darwin)
>  		APPLE_COMMON_CRYPTO = YesPlease
>  		COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
>  	endif
> -	NO_REGEX = YesPlease
>  	PTHREAD_LIBS =
>  endif

I notice that this is the ONLY conditional section in our primary
Makefile that switches upon the value of $(uname_<anything>).  After
the dust settles, we should move this whole part to config.mak.uname
as an unrelated clean-up.

Alternatively, you can choose to do that (without losing NO_REGEX)
as a preliminary clean-up patch and then build this "recent Darwin
has usable regexp library" change on top of it.  I do not have a
preference either way, other than that we do not want to see that
done as part of this change "while we are at it".

> diff --git a/grep.c b/grep.c
> index 82eb7da1022..c31657c3da3 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -10,6 +10,8 @@
>  #include "quote.h"
>  #include "help.h"
>  
> +#include <locale.h>

Don't ever include system headers in generic *.c files.  The system
headers on different platforms have subtle ordering restrictions
among include files and definitions of feature test macros, and all
those subtleties are supposed to be handled in one central place, in
the "git-compat-util.h" file.

A source file specific only to a particular platform is a different
matter; they can include system headers that exist or are needed
only on those systems directly.

So far, "grep.c" has been successfully used for everybody (except
Darwin), and if it needs touching, that means something is still
wrong with Darwin.

Is is because you are not using gettext()?  I thought our
gettext.[ch] did consider some folks/platforms do not use system
gettext() but perhaps its support for such build environment is
slightly lacking and does not make necessary setlocale() calls.

If that is why you are mucking with this file, the right place to
add changes like this is not here specific to grep.c, but to the
start-up sequence that happens before cmd_main() inside
common-main.c or somewhere around there, I think.  Let me summon the
author of 5e9637c6 (i18n: add infrastructure for translating Git
with gettext, 2011-11-18) by Cc'ing.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux