Junio C Hamano <gitster@xxxxxxxxx> writes: > Teach the hash function and per-line comparison logic to compare lines > while ignoring the differences in case. It is not an ignore-whitespace > option but still needs to trigger the inexact match logic, and that is > why the previous step introduced XDF_INEXACT_MATCH mask. Nb. how it compares with ignore case in filesystem paths? > Assign the 7th bit for this option, and move the bits to select diff > algorithms out of the way in order to leave room for a few bits to add > more variants of ignore-whitespace, such as --ignore-tab-expansion, if > somebody else is inclined to do so later. Or do a proper Unicode sorting / collation algorithm, with different levels (4.3 Form a sort key for each string, UTS #10.): Level 1: alphabetic ordering Level 2: diacritic ordering Level 3: case ordering Level 4: tie-breaking (e.g. in the case when variable is 'shifted') > We would still need to teach the front-end to flip this bit, for this > change to be any useful. > > Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> > --- > +static inline int match_a_byte(char ch1, char ch2, long flags) > +{ > + if (ch1 == ch2) > + return 1; > + if (!(flags & XDF_IGNORE_CASE) || ((ch1 | ch2) & 0x80)) > + return 0; > + if (isupper(ch1)) > + ch1 = tolower(ch1); > + if (isupper(ch2)) > + ch2 = tolower(ch2); > + return (ch1 == ch2); > +} <del> Wouldn't a better solution be a collate algorithm rather than changing a sorting function? Or is it a performance hack on typical body of text under version control (mainly lowercase)? </del> "(libc.info)Collation Fuctions" says: The functions `strcoll' and `wcscoll' perform this translation implicitly, in order to do one comparison. By contrast, `strxfrm' and `wcsxfrm' perform the mapping explicitly. If you are making multiple comparisons using the same string or set of strings, it is likely to be more efficient to use `strxfrm' or `wcsxfrm' to transform all the strings just once, and subsequently compare the transformed strings with `strcmp' or `wcscmp'. The function match_a_byte (memcoll?) defined here is similar to strcoll; do we compare single line with more than one other line? > +static inline unsigned long hash_a_byte(const char ch_, long flags) > +{ > + unsigned long ch = ch_ & 0xFF; > + if ((flags & XDF_IGNORE_CASE) && !(ch & 0x80) && isupper(ch)) > + ch = tolower(ch); > + return ch; > +} > + Hmmm... hash_a_byte (memxfrm?) is similar to strxfrm, so you do use one or the other... -- Jakub Narebski -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html