On Wed, Sep 12, 2018 at 6:18 PM Ben Peart <benpeart@xxxxxxxxxxxxx> wrote: > > This patch helps address the CPU cost of loading the index by creating > multiple threads to divide the work of loading and converting the cache > entries across all available CPU cores. > > It accomplishes this by having the primary thread loop across the index file > tracking the offset and (for V4 indexes) expanding the name. It creates a > thread to process each block of entries as it comes to them. I added a couple trace_printf() to see how time is spent. This is with a 1m entry index (basically my webkit.git index repeated 4 times) 12:50:00.084237 read-cache.c:1721 start loading index 12:50:00.119941 read-cache.c:1943 performance: 0.034778758 s: loaded all extensions (1667075 bytes) 12:50:00.185352 read-cache.c:2029 performance: 0.100152079 s: loaded 367110 entries 12:50:00.189683 read-cache.c:2126 performance: 0.104566615 s: finished scanning all entries 12:50:00.217900 read-cache.c:2029 performance: 0.082309193 s: loaded 367110 entries 12:50:00.259969 read-cache.c:2029 performance: 0.070257130 s: loaded 367108 entries 12:50:00.263662 read-cache.c:2278 performance: 0.179344458 s: read cache .git/index Two observations: - the extension thread finishes up quickly (this is with TREE extension alone). We could use that spare core to parse some more entries. - the main "scanning and allocating" thread does hold up the two remaining threads. You can see the first index entry thread is finished even before the scanning thread. And this scanning thread takes a lot of cpu. If all index entry threads start at the same time, based on these numbers we would be finished around 12:50:00.185352 mark, cutting loading time by half. Could you go back to your original solution? If you don't want to spend more time on this, I offer to rewrite this patch. -- Duy