Re: [PATCH 1/2] [GSOC] ref-filter: add %(raw) atom

Phillip Wood <phillip.wood123@xxxxxxxxx> · Sun, 30 May 2021 14:02:17 +0100

Hi ZheNing

On 30/05/2021 07:26, ZheNing Hu wrote:
Phillip Wood <phillip.wood123@xxxxxxxxx> 于2021年5月29日周六 下午9:23写道：

On 27/05/2021 17:36, Felipe Contreras wrote:
ZheNing Hu via GitGitGadget wrote:
[...]
All we have to do is define the end point, and then we don't need i:

       static int memcasecmp(const char *s1, const char *s2, size_t n)
       {
               const char *end = s1 + n;
               for (; s1 < end; s1++, s2++) {
                       int diff = tolower(*s1) - tolower(*s2);
                       if (diff)
                               return diff;
               }
               return 0;
       }

(and I personally prefer lower to upper)

We should be using tolower() as that is what POSIX specifies for
strcasecmp() [1] which we are trying to emulate and there are cases[2] where
         (tolower(c1) == tolower(c2)) != (toupper(c1) == toupper(c2))

I don’t know if we overlooked a fact: This static `memcasecmp()`
is not a POSIX version. `tolower()` or `toupper()` are in git-compat-util.h,
sane_istest('\0', GIT_ALPHA) == false . So in `sane_case()`, whatever
`tolower()`, `toupper()`, they just return '\0' itself.

Well spotted, thanks for pointing that out. So memcasecmp() and 
strcasecmp() may give different results. I'm not sure if that matters - 
as I understand it the main use for the 'raw' atom is with `git cat-file 
--batch` which does not support sorting. Also although strcasecmp() uses 
the current locale it does a byte-by-byte comparison so it is 
effectively ASCII only for UTF-8 anyway.

Best Wishes

Phillip

Best Wishes

Phillip

[1] https://pubs.opengroup.org/onlinepubs/9699919799/
[2] https://en.wikipedia.org/wiki/Dotted_and_dotless_I#In_computing

Thanks.
--
ZhenNing Hu