Re: [PATCH v5 3/5] read-cache: load cache entries on worker threads

Duy Nguyen <pclouds@xxxxxxxxx> · Sat, 15 Sep 2018 13:07:46 +0200

On Wed, Sep 12, 2018 at 6:18 PM Ben Peart <benpeart@xxxxxxxxxxxxx> wrote:
>
> This patch helps address the CPU cost of loading the index by creating
> multiple threads to divide the work of loading and converting the cache
> entries across all available CPU cores.
>
> It accomplishes this by having the primary thread loop across the index file
> tracking the offset and (for V4 indexes) expanding the name. It creates a
> thread to process each block of entries as it comes to them.

I added a couple trace_printf() to see how time is spent. This is with
a 1m entry index (basically my webkit.git index repeated 4 times)

12:50:00.084237 read-cache.c:1721       start loading index
12:50:00.119941 read-cache.c:1943       performance: 0.034778758 s:
loaded all extensions (1667075 bytes)
12:50:00.185352 read-cache.c:2029       performance: 0.100152079 s:
loaded 367110 entries
12:50:00.189683 read-cache.c:2126       performance: 0.104566615 s:
finished scanning all entries
12:50:00.217900 read-cache.c:2029       performance: 0.082309193 s:
loaded 367110 entries
12:50:00.259969 read-cache.c:2029       performance: 0.070257130 s:
loaded 367108 entries
12:50:00.263662 read-cache.c:2278       performance: 0.179344458 s:
read cache .git/index

Two observations:

- the extension thread finishes up quickly (this is with TREE
extension alone). We could use that spare core to parse some more
entries.

- the main "scanning and allocating" thread does hold up the two
remaining threads. You can see the first index entry thread is
finished even before the scanning thread. And this scanning thread
takes a lot of cpu.

If all index entry threads start at the same time, based on these
numbers we would be finished around 12:50:00.185352 mark, cutting
loading time by half.

Could you go back to your original solution? If you don't want to
spend more time on this, I offer to rewrite this patch.
-- 
Duy