Karsten Blees <karsten.blees@xxxxxxxxx> writes: > With core.ignorecase=true, name-hash.c builds a case insensitive index of > all tracked directories. Currently, the existing cache entry structures are > added multiple times to the same hashtable (with different name lengths and > hash codes). However, there's only one dir_next pointer, which gets > completely messed up in case of hash collisions. In the worst case, this > causes an endless loop if ce == ce->dir_next (see t7062). > > Use a separate hashtable and separate structures for the directory index > so that each directory entry has its own next pointer. Use reference > counting to track which directory entry contains files. > > There are only slight changes to the name-hash.c API: > - new free_name_hash() used by read_cache.c::discard_index() > - remove_name_hash() takes an additional index_state parameter > - index_name_exists() for a directory (trailing '/') may return a cache > entry that has been removed (CE_UNHASHED). This is not a problem as the > return value is only used to check if the directory exists (dir.c) or to > normalize casing of directory names (read-cache.c). > > Getting rid of cache_entry.dir_next reduces memory consumption, especially > with core.ignorecase=false (which doesn't use that member at all). > > With core.ignorecase=true, building the directory index is slightly faster > as we add / check the parent directory first (instead of going through all > directory levels for each file in the index). E.g. with WebKit (~200k > files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms > to 130ms. > > Signed-off-by: Karsten Blees <blees@xxxxxxx> > --- One thing that still puzzles me is what guarantee we have on the liftime of these ce's that are borrowed by these dir_hash entries. There are a few places where we call free(ce) around "aliased" entries (only happens with ignore_case set). I do not think it is a new issue (we used to borrow a ce to represent a directory in the name_hash by using the leading prefix of its name anyway, and this patch only changes which hash table is used to hold it), and I do not think it will be an issue for case sensitive systems, so I would stop being worried about it for now, though ;-) Thanks, will replace and queue. > diff --git a/t/t7062-wtstatus-ignorecase.sh b/t/t7062-wtstatus-ignorecase.sh > new file mode 100755 > index 0000000..73709db > --- /dev/null > +++ b/t/t7062-wtstatus-ignorecase.sh > @@ -0,0 +1,20 @@ > +#!/bin/sh > + > +test_description='git-status with core.ignorecase=true' > + > +. ./test-lib.sh > + > +test_expect_success 'status with hash collisions' ' > + # note: "V/", "V/XQANY/" and "WURZAUP/" produce the same hash code > + # in name-hash.c::hash_name > + mkdir V && > + mkdir V/XQANY && > + mkdir WURZAUP && > + touch V/XQANY/test && > + git config core.ignorecase true && > + git add . && > + # test is successful if git status completes (no endless loop) > + git status > +' > + > +test_done -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html