Am 11.02.2013 04:53, schrieb Duy Nguyen: > On Sun, Feb 10, 2013 at 11:58 PM, Erik Faye-Lund <kusmabite@xxxxxxxxx> wrote: >> Karsten Blees has done something similar-ish on Windows, and he posted >> the results here: >> >> https://groups.google.com/forum/#!topic/msysgit/fL_jykUmUNE/discussion >> The new hashtable implementation in fscache [1] supports O(1) removal and has no mingw dependencies - might come in handy for anyone trying to implement an inotify daemon. [1] https://github.com/kblees/git/commit/f7eb85c2 >> I also seem to remember he doing a ReadDirectoryChangesW version, but >> I don't remember what happened with that. > > Thanks. I came across that but did not remember. For one thing, we > know the inotify alternative for Windows: ReadDirectoryChangesW. > I dropped ReadDirectoryChangesW because maintaining a 'live' file system cache became more and more complicated. For example, according to MSDN docs, ReadDirectoryChangesW *may* report short DOS 8.3 names (i.e. "PROGRA~1" instead of "Program Files"), so a correct and fast cache implementation would have to be indexed by long *and* short names... Another problem was that the 'live' cache had quite negative performance impact on mutating git commands (checkout, reset...). An inotify daemon running as a background process (not in-process as fscache) will probably affect everyone that modifies the working copy, e.g. running 'make' or the test-suite. This should be considered in the design. > I copy "git status"'s (impressive) numbers from fscache-v0 for those > who are interested in: > > preload | -u | normal | cached | gain > --------+-----+--------+--------+------ > false | all | 25.144 | 3.055 | 8.2 > false | no | 22.822 | 1.748 | 12.8 > true | all | 9.234 | 2.179 | 4.2 > true | no | 6.833 | 0.955 | 7.2 > Note that I wasn't able to reproduce such bad 'normal' values in later tests, I guess disk fragmentation and/or virus scanner must have tricked me on that day...gain factors of 2.5 - 5 are more appropriate. However, the difference between git status -uall and -uno was always about 1.3 s in all fscache versions, even though opendir/readdir/closedir was served entirely from the cache. I added a bit of performance tracing to find the cause, and I think most of the time spent in wt_status_collect_untracked can be eliminated: 1.) 0.939 s is spent in dir.c/excluded (i.e. checking .gitignore). This check is done for *every* file in the working copy, including files in the index. Checking the index first could eliminate most of that, i.e.: (Note: patches are for discussion only, I'm aware that they may have unintended side effects...) @@ -1097,6 +1097,8 @@ static enum path_treatment treat_path(struct dir_struct *dir, return path_ignored; strbuf_setlen(path, baselen); strbuf_addstr(path, de->d_name); + if (cache_name_exists(path->buf, path->len, ignore_case)) + return path_ignored; if (simplify_away(path->buf, path->len, simplify)) return path_ignored; --- 2.) 0.135 s is spent in name-hash.c/hash_index_entry_directories, reindexing the same directories over and over again. In the end, the hashtable contains 939k directory entries, even though the WebKit test repo only has 7k directories. Checking if a directory entry already exists could reduce that, i.e.: @@ -53,14 +55,23 @@ static void hash_index_entry_directories(struct index_state *istate, struct cach unsigned int hash; void **pos; double t = ticks(); + struct cache_entry *ce2; + int len = ce_namelen(ce); - const char *ptr = ce->name; - while (*ptr) { - while (*ptr && *ptr != '/') - ++ptr; - if (*ptr == '/') { - ++ptr; - hash = hash_name(ce->name, ptr - ce->name); + while (len > 0) { + while (len > 0 && ce->name[len - 1] != '/') + len--; + if (len > 0) { + hash = hash_name(ce->name, len); + ce2 = lookup_hash(hash, &istate->name_hash); + while (ce2) { + if (same_name(ce2, ce->name, len, ignore_case)) { + add_since(t, &hash_dirs); + return; + } + ce2 = ce2->dir_next; + } + len--; pos = insert_hash(hash, ce, &istate->name_hash); if (pos) { ce->dir_next = *pos; --- Tests were done with the WebKit repo (~200k files, ~7k directories, 15 .gitignore files, ~100 entries in root .gitignore). Instrumented code can be found here: https://github.com/kblees/git/tree/kb/git-status-performance-tracing Here's the performance traces of 'git status -s -uall' Before patches: trace: at builtin/commit.c:1221, time: 0.523429 s: cmd_status/read_cache_preload trace: at builtin/commit.c:1223, time: 0.00403477 s: cmd_status/refresh_index trace: at builtin/commit.c:1231, time: 0.00318494 s: cmd_status/hold_locked_index trace: at wt-status.c:539, time: 0.00527396 s: wt_status_collect_changes_worktree trace: at wt-status.c:544, time: 0.00545771 s: wt_status_collect_changes trace: at wt-status.c:546, time: 1.286 s: wt_status_collect_untracked trace: at builtin/commit.c:1233, time: 1.29852 s: cmd_status/wt_status_collect trace: at dir.c:1540, time: 0.00170986 s: read_directory_recursive/strbuf_add trace: at dir.c:1541, time: 0.00623972 s: read_directory_recursive/opendir trace: at dir.c:1542, time: 0.00517881 s: read_directory_recursive/readdir trace: at dir.c:1543, time: 0.992936 s: read_directory_recursive/treat_path trace: at dir.c:1544, time: 0.277942 s: read_directory_recursive/dir_add_name trace: at dir.c:1545, time: 0.0014594 s: read_directory_recursive/close trace: at dir.c:1546, time: 0.939349 s: treat_one_path/excluded trace: at dir.c:1547, time: 0.0050811 s: treat_one_path/dir_add_ignored trace: at dir.c:1548, time: 0.00515875 s: treat_one_path/get_dtype trace: at dir.c:1549, time: 0.00329322 s: treat_one_path/treat_directory trace: at dir.c:1550, time: 0.222969 s: excluded/prep_exclude trace: at dir.c:1551, time: 0.00443398 s: excluded/excluded_from_list[EXC_CMDL] trace: at dir.c:1552, time: 0.699602 s: excluded/excluded_from_list[EXC_DIRS] trace: at dir.c:1553, time: 0.00475736 s: excluded/excluded_from_list[EXC_FILE] trace: at read-cache.c:460, time: 0.00967987 s: index_name_pos trace: at name-hash.c:213, time: 0.190481 s: lazy_init_name_hash trace: at name-hash.c:216, time: 0.135248 s: hash_index_entry_directories (938865 entries) trace: at name-hash.c:217, time: 0.0806647 s: index_name_exists trace: at compat/mingw.c:2137, time: 1.97424 s: command: c:\git\msysgit\git\git-status.exe -s -uall After patches: trace: at builtin/commit.c:1221, time: 0.517511 s: cmd_status/read_cache_preload trace: at builtin/commit.c:1223, time: 0.00405227 s: cmd_status/refresh_index trace: at builtin/commit.c:1231, time: 0.00322796 s: cmd_status/hold_locked_index trace: at wt-status.c:539, time: 0.00530057 s: wt_status_collect_changes_worktree trace: at wt-status.c:544, time: 0.00546062 s: wt_status_collect_changes trace: at wt-status.c:546, time: 0.322799 s: wt_status_collect_untracked trace: at builtin/commit.c:1233, time: 0.33536 s: cmd_status/wt_status_collect trace: at dir.c:1542, time: 0.00120529 s: read_directory_recursive/strbuf_add trace: at dir.c:1543, time: 0.00476647 s: read_directory_recursive/opendir trace: at dir.c:1544, time: 0.00502022 s: read_directory_recursive/readdir trace: at dir.c:1545, time: 0.310515 s: read_directory_recursive/treat_path trace: at dir.c:1546, time: 0 s: read_directory_recursive/dir_add_name trace: at dir.c:1547, time: 0.000831234 s: read_directory_recursive/close trace: at dir.c:1548, time: 0.0668582 s: treat_one_path/excluded trace: at dir.c:1549, time: 0.000173174 s: treat_one_path/dir_add_ignored trace: at dir.c:1550, time: 0.000174267 s: treat_one_path/get_dtype trace: at dir.c:1551, time: 0.00315468 s: treat_one_path/treat_directory trace: at dir.c:1552, time: 0.039733 s: excluded/prep_exclude trace: at dir.c:1553, time: 0.000185205 s: excluded/excluded_from_list[EXC_CMDL] trace: at dir.c:1554, time: 0.0264496 s: excluded/excluded_from_list[EXC_DIRS] trace: at dir.c:1555, time: 0.000170622 s: excluded/excluded_from_list[EXC_FILE] trace: at read-cache.c:460, time: 0.00260636 s: index_name_pos trace: at name-hash.c:224, time: 0.126637 s: lazy_init_name_hash trace: at name-hash.c:227, time: 0.0500866 s: hash_index_entry_directories (7152 entries) trace: at name-hash.c:228, time: 0.0790143 s: index_name_exists trace: at compat/mingw.c:2137, time: 1.00595 s: command: c:\git\msysgit\git\git-status.exe -s -uall -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html