On 9/15/2018 7:09 AM, Duy Nguyen wrote:
On Sat, Sep 15, 2018 at 01:07:46PM +0200, Duy Nguyen wrote:
12:50:00.084237 read-cache.c:1721 start loading index
12:50:00.119941 read-cache.c:1943 performance: 0.034778758 s: loaded all extensions (1667075 bytes)
12:50:00.185352 read-cache.c:2029 performance: 0.100152079 s: loaded 367110 entries
12:50:00.189683 read-cache.c:2126 performance: 0.104566615 s: finished scanning all entries
12:50:00.217900 read-cache.c:2029 performance: 0.082309193 s: loaded 367110 entries
12:50:00.259969 read-cache.c:2029 performance: 0.070257130 s: loaded 367108 entries
12:50:00.263662 read-cache.c:2278 performance: 0.179344458 s: read cache .git/index
The previous mail wraps these lines and make it a bit hard to read. Corrected now.
--
Duy
Interesting! Clearly the data shape makes a big difference here as I
had run a similar test but in my case, the extensions thread actually
finished last (and it's cost is what drove me to move that onto a
separate thread that starts first).
Purpose First Last Duration
load_index_extensions_thread 719.40 968.50 249.10
load_cache_entries_thread 718.89 738.65 19.76
load_cache_entries_thread 730.39 753.83 23.43
load_cache_entries_thread 741.23 751.23 10.00
load_cache_entries_thread 751.93 780.88 28.95
load_cache_entries_thread 763.60 791.31 27.72
load_cache_entries_thread 773.46 783.46 10.00
load_cache_entries_thread 783.96 794.28 10.32
load_cache_entries_thread 795.61 805.52 9.91
load_cache_entries_thread 805.99 827.21 21.22
load_cache_entries_thread 816.85 826.85 10.00
load_cache_entries_thread 827.03 837.96 10.93
In my tests, the scanning thread clearly delayed the later ce threads
but given the extension was so slow, it didn't impact the overall time
nearly as much as your case.
I completely agree that the optimal solution would be to go back to my
original patch/design. It eliminates the overhead of the scanning
thread entirely and allows all threads to start at the same time. This
would ensure the best performance whether the extensions were the
longest thread or the cache entry threads took the longest.
I ran out of time and energy last year so dropped it to work on other
tasks. I appreciate your offer of help. Perhaps between the two of us
we could successfully get it through the mailing list this time. :-)
Let me go back and see what it would take to combine the current EOIE
patch with the older IEOT patch.
I'm also intrigued with your observation that over committing the cpu
actually results in time savings. I hadn't tested that. It looks like
that could have a positive impact on the overall time and warrant a
change to the default nr_threads logic.