Am 11.02.23 um 14:48 schrieb René Scharfe: > Am 11.02.23 um 08:01 schrieb Masahiro Yamada: >> On Sat, Feb 11, 2023 at 7:03 AM René Scharfe <l.s.r@xxxxxx> wrote: >>> >>> We already use eight bits for the lookup table values in sane_ctype. >>> Perhaps it's time to move GIT_GLOB_SPECIAL, GIT_REGEX_SPECIAL, >>> GIT_PATHSPEC_MAGIC and GIT_PUNCT to their own table to make room for >>> flags for isprint(), isgraph() and isblank(). >> >> >> >> I think that is a good idea. >> >> If we can have more room for isupper() and islower(), >> we will be able to delete sane_iscase(). >> >> >> (I was thinking 'unsigned short', but having two tables is OK.) >> >> >> >> So, how can I proceed with this patchset? >> >> >> >> [A] After somebody refactors ctype.c table, >> I will rebase this series on top of that. >> >> [B] keep isblank() and isgraph() local to wildmatch.c >> until that happens, so I can proceed without >> depending on the ctype.c refactoring. >> >> Apparently, wildmatch.c is not using a pointer >> increment with isblank() or isgraph(). >> >> [C] If 'somebody' in [A] is supposed to me, >> my v2 will include ctype refactoring. > > [D] We need more tests first. :) I sent patches to test the current > classifiers separately. A similar test for isblank would have helped > you detect the mismatch quickly. Full test coverage also gives > confidence when tinkering with the existing classifiers. > > 1c149ab2dd (ctype: support iscntrl, ispunct, isxdigit and isprint, > 2012-10-15) added an implementation of isprint that evaluated its > argument only once, by the way, but the one from 0fcec2ce54 > (format-patch: make rfc2047 encoding more strict, 2012-10-18) > replaced it. > > Widening the element type of the lookup table would work, but might > impact performance. I guess it would be slower, but perhaps we'd > have to measure it to be sure. Splitting the table into unrelated > subsets would avoid that. > > Deciding which flags to move requires knowing the full target set, > I think. The punctuation-related flags looked to me like good > candidates until I saw the original isprint implementation which > uses them and several of the others. So I'm not so sure anymore, > but here's a patch that moves them out: Perhaps: [E] Implement isblank and isgraph as inline functions and leave the lookup table integration for later: #undef isblank #define isblank(x) sane_isblank(x) static inline int sane_isblank(int c) {return c == ' ' || c == '\t';} #undef isgraph #define isgraph(x) sane_isgraph(x) static inline int sane_isgraph(int c) {return isprint(c) && c != ' ';} The Lazy Way™ ;) René