Re: [PATCH v2] grep: fix multibyte regex handling under macOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 24-Aug-22 21:56, Junio C Hamano wrote:
"Diomidis Spinellis via GitGitGadget" <gitgitgadget@xxxxxxxxx>
writes:

From: Diomidis Spinellis <dds@xxxxxxx>

The 2013 commit 29de20504e9 fixed t0070-fundamental.sh under Darwin
(macOS) by adopting Git's regex library.  However, this library is
compiled with NO_MBSUPPORT, which causes git-grep to work incorrectly
on multibyte (e.g. UTF-8) files.  Current macOS versions pass
t0070-fundamental.sh with the native macOS regex library, which
also supports multibyte characters.

Very nice to see the last sentence in that paragraph.

Signed-off-by: Diomidis Spinellis <dds@xxxxxxx>
---

This part, from here ...

[...]

... up to here does not need a separate sign-off.

Apologies for this. I was led to believe that gitgitgadget would seamlessly interface between GitHub's pull request and the Git project's email workflow, but I don't see this happening. Not sure if this is due to a fault in gitgitgadget or due to my misuse of it. I think gitgitgadget expects users not to squash their subsequent commits. In any case, I'm switching to git-send-mail, which provides more visibility and control of the process.

diff --git a/Makefile b/Makefile
index 04d0fd1fe60..d1a98257150 100644
--- a/Makefile
+++ b/Makefile
@@ -1427,7 +1427,6 @@ ifeq ($(uname_S),Darwin)
  		APPLE_COMMON_CRYPTO = YesPlease
  		COMPAT_CFLAGS += -DAPPLE_COMMON_CRYPTO
  	endif
-	NO_REGEX = YesPlease
  	PTHREAD_LIBS =
  endif

I notice that this is the ONLY conditional section in our primary
Makefile that switches upon the value of $(uname_<anything>).  After
the dust settles, we should move this whole part to config.mak.uname
as an unrelated clean-up.

Alternatively, you can choose to do that (without losing NO_REGEX)
as a preliminary clean-up patch and then build this "recent Darwin
has usable regexp library" change on top of it.  I do not have a
preference either way, other than that we do not want to see that
done as part of this change "while we are at it".

I was also surprised to see a $(uname) conditional in the Makefile. I prefer to finish the macOS grep UTF-8 bug fix before discussing related house cleaning.

diff --git a/grep.c b/grep.c
index 82eb7da1022..c31657c3da3 100644
--- a/grep.c
+++ b/grep.c
@@ -10,6 +10,8 @@
  #include "quote.h"
  #include "help.h"
+#include <locale.h>

Don't ever include system headers in generic *.c files.  The system
headers on different platforms have subtle ordering restrictions
among include files and definitions of feature test macros, and all
those subtleties are supposed to be handled in one central place, in
the "git-compat-util.h" file.

A source file specific only to a particular platform is a different
matter; they can include system headers that exist or are needed
only on those systems directly.

So far, "grep.c" has been successfully used for everybody (except
Darwin), and if it needs touching, that means something is still
wrong with Darwin.

Thank you for clarifying the division of responsibility among files. I changed the fix accordingly.

Is is because you are not using gettext()?  I thought our
gettext.[ch] did consider some folks/platforms do not use system
gettext() but perhaps its support for such build environment is
slightly lacking and does not make necessary setlocale() calls.

I believe that grep was working by accident on other platforms, thanks to setlocale calls made in gettext.c. I don't think grep should depend on the implementation of gettext. Thankfully, according to test results, it turns out that we can now globally initialize the CTYPE locale, making its state available both to gettext.c and grep.c.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux