Re: [PATCH v1] read-cache: speed up index load through parallelization

Stefan Beller <sbeller@xxxxxxxxxx> · Thu, 23 Aug 2018 10:31:35 -0700

On Thu, Aug 23, 2018 at 8:45 AM Ben Peart <Ben.Peart@xxxxxxxxxxxxx> wrote:
>
> This patch helps address the CPU cost of loading the index by creating
> multiple threads to divide the work of loading and converting the cache
> entries across all available CPU cores.
>
> It accomplishes this by having the primary thread loop across the index file
> tracking the offset and (for V4 indexes) expanding the name. It creates a
> thread to process each block of entries as it comes to them. Once the
> threads are complete and the cache entries are loaded, the rest of the
> extensions can be loaded and processed normally on the primary thread.
>
> Performance impact:
>
> read cache .git/index times on a synthetic repo with:
>
> 100,000 entries
> FALSE       TRUE        Savings     %Savings
> 0.014798767 0.009580433 0.005218333 35.26%
>
> 1,000,000 entries
> FALSE       TRUE        Savings     %Savings
> 0.240896533 0.1751243   0.065772233 27.30%
>
> read cache .git/index times on an actual repo with:
>
> ~3M entries
> FALSE       TRUE        Savings     %Savings
> 0.59898098  0.4513169   0.14766408  24.65%
>
> Signed-off-by: Ben Peart <Ben.Peart@xxxxxxxxxxxxx>
> ---
>
> Notes:
>     Base Ref: master
>     Web-Diff: https://github.com/benpeart/git/commit/67a700419b
>     Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v1 && git checkout 67a700419b
>
>  Documentation/config.txt |   8 ++
>  config.c                 |  13 +++
>  config.h                 |   1 +
>  read-cache.c             | 218 ++++++++++++++++++++++++++++++++++-----
>  4 files changed, 216 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1c42364988..3344685cc4 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -899,6 +899,14 @@ relatively high IO latencies.  When enabled, Git will do the
>  index comparison to the filesystem data in parallel, allowing
>  overlapping IO's.  Defaults to true.
>
> +core.fastIndex::
> +       Enable parallel index loading
> ++
> +This can speed up operations like 'git diff' and 'git status' especially
> +when the index is very large.  When enabled, Git will do the index
> +loading from the on disk format to the in-memory format in parallel.
> +Defaults to true.

"fast" is a non-descriptive word as we try to be fast in any operation?
Maybe core.parallelIndexReading as that just describes what it
turns on/off, without second guessing its effects?
(Are there still computers with just a single CPU, where this would not
make it faster? ;-))

> +int git_config_get_fast_index(void)
> +{
> +       int val;
> +
> +       if (!git_config_get_maybe_bool("core.fastindex", &val))
> +               return val;
> +
> +       if (getenv("GIT_FASTINDEX_TEST"))
> +               return 1;

We look at this env value just before calling this function,
can be write it to only look at the evn variable once?

> +++ b/config.h
> @@ -250,6 +250,7 @@ extern int git_config_get_untracked_cache(void);
>  extern int git_config_get_split_index(void);
>  extern int git_config_get_max_percent_split_change(void);
>  extern int git_config_get_fsmonitor(void);
> +extern int git_config_get_fast_index(void);

Oh. nd/no-extern did not cover config.h

>
> +#ifndef min
> +#define min(a,b) (((a) < (b)) ? (a) : (b))
> +#endif

We do not have a minimum function in the tree,
except for xdiff/xmacros.h:29: XDL_MIN.
I wonder what the rationale is for not having a MIN()
definition, I think we discussed that on the list a couple
times but the rationale escaped me.

If we introduce a min/max macro, can we put it somewhere
more prominent? (I would find it useful elsewhere)

> +/*
> +* Mostly randomly chosen maximum thread counts: we
> +* cap the parallelism to online_cpus() threads, and we want
> +* to have at least 7500 cache entries per thread for it to
> +* be worth starting a thread.
> +*/
> +#define THREAD_COST            (7500)

This reads very similar to preload-index.c THREAD_COST

> +       /* loop through index entries starting a thread for every thread_nr entries */
> +       consumed = thread = 0;
> +       for (i = 0; ; i++) {
> +               struct ondisk_cache_entry *ondisk;
> +               const char *name;
> +               unsigned int flags;
> +
> +               /* we've reached the begining of a block of cache entries, kick off a thread to process them */

beginning

Thanks,
Stefan