Am 22.10.2013 00:58, schrieb pro-logic: >> The trace_performance functions require manual instrumentation of >> the code sections you want to measure > Ahh a case of RTFM :) > >> Could you post details about your test setup? Are you still using >> WebKit for your tests? > I'm on Win7 x64, Core i5 M560, WD 7200 Laptop HDD, NTSF, no virus > scanner, truecrypt, no defragger. > OK, so truecrypt and luafv may screw things up for you (according to my measurements, luafv roughly doubles lstat times on C:). > I've tried to be a bit smarter with the intent of my code, and this > is what I came up with. > > diff --git a/cache.h b/cache.h > index 4bf19e3..2e9fb1f 100644 > --- a/cache.h > +++ b/cache.h > @@ -294,7 +294,7 @@ extern void free_name_hash(struct index_state *istate); > #define active_cache_changed (the_index.cache_changed) > #define active_cache_tree (the_index.cache_tree) > > -#define read_cache() read_index(&the_index) > +#define read_cache() read_index_preload(&the_index, NULL) > #define read_cache_from(path) read_index_from(&the_index, (path)) > #define read_cache_preload(pathspec) read_index_preload(&the_index, (pathspec)) > #define is_cache_unborn() is_index_unborn(&the_index) > diff --git a/read-cache.c b/read-cache.c > index c3d5e35..5fb2788 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1866,7 +1866,7 @@ int read_index_unmerged(struct index_state *istate) > int i; > int unmerged = 0; > > -read_index(istate); > +read_index_preload(istate, NULL); > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce = istate->cache[i]; > struct cache_entry *new_ce; > -- > Ahh, I thought that you had enabled fscache during the entire checkout. > Interestingly when I run on a cleanly checked out blink repo my > changes seem to make matters worse in terms of performance, but when > working on a repo with ignored files in it it seems to work better. > So for point of comparison I decided to run it on a comparison on a > repo with working ignored files in it in this case msysgit/git after > a 'make install'. When I get a few hours I'll try to build blink and > re-run the numbers on a much much larger repo. > > This comparison is a average of 3 cold cache runs of the > kb/fscache-v4 [a] vs kb/fscache-v4 with my above changes applied [b], > with preloadindex and fscache set to true. > > For comparison > git status -s > [a] 3.02s > [b] 2.92s > > git reset --hard head > [a] 3.67s > [b] 3.09s > These numbers look far too good, so you don't actually do a fresh checkout, do you? I mean, delete all files except .git; killcache; git reset --hard / git checkout -f? That would also explain your 95% lstat times, if there's nothing to do... > git add -u > [a] 2.89s > [b] 2.08s > > > I noticed something interesting. Preload index uses 20 threads to do > the work. When I was keeping an eye on them in task manager some > threads will finish quite quickly, while others will run a lot > longer. The way I understand the code at the moment the threads get > equal chunks of work to perform. It's quite lilkely that even more > performance could be obtained out of preload if the work splitting > was 'smarter'. My currently best idea would be to use something like > a lock-free queue to queue up the work and let the threads get the > work of the queue. That way all threads are busy with work for > longer. A candidate for the implementation would be libfds [1] queue. > However my issue with this library and the reason I haven't tried to > integrate is simply because the code expressly has no license. > As cache/cache_nr are not modified by the threads, you actually don't need a lock-free queue. An atomic counter shared by all threads should suffice (i.e. pthread's equivalent to InterlockedIncrement/InterlockedAdd). Karsten -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html