Re: [PATCH v2 1/3] read-cache: speed up index load through parallelization

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 29 Aug 2018 10:14:24 -0700

Ben Peart <Ben.Peart@xxxxxxxxxxxxx> writes:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1c42364988..79f8296d9c 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2391,6 +2391,12 @@ imap::
>  	The configuration variables in the 'imap' section are described
>  	in linkgit:git-imap-send[1].
>  
> +index.threads::
> +	Specifies the number of threads to spawn when loading the index.
> +	This is meant to reduce index load time on multiprocessor machines.
> +	Specifying 0 or 'true' will cause Git to auto-detect the number of
> +	CPU's and set the number of threads accordingly. Defaults to 'true'.

"0 or 'true' means 'auto'" made me go "Huh?"

The "Huh?"  I initially felt comes from the fact that usually 0 and
false are interchangeable, but for this particular application,
"disabling" the threading means setting the count to one (not zero),
leaving us zero as a usable "special value" to signal 'auto'.

So the end result does make sense, especially with this bit ...

> diff --git a/config.c b/config.c
> index 9a0b10d4bc..3bda124550 100644
> --- a/config.c
> +++ b/config.c
> @@ -2289,6 +2289,20 @@ int git_config_get_fsmonitor(void)
> ...
> +	if (!git_config_get_bool_or_int("index.threads", &is_bool, &val)) {
> +		if (is_bool)
> +			return val ? 0 : 1;
> +		else
> +			return val;

... which says "'0' and 'true' are the same and yields 0, '1' and
'false' yields 1, and '2' and above will give the int".  

Adding something like

	You can disable multi-threaded code by setting this variable
	to 'false' (or 1).

may reduce the risk of a similar "Huh?" reaction by other readers.

> +struct load_cache_entries_thread_data
> +{
> +	pthread_t pthread;
> +	struct index_state *istate;
> +	struct mem_pool *ce_mem_pool;
> +	int offset, nr;
> +	void *mmap;
> +	unsigned long start_offset;
> +	struct strbuf previous_name_buf;
> +	struct strbuf *previous_name;
> +	unsigned long consumed;	/* return # of bytes in index file processed */
> +};

We saw that Duy's "let's not use strbuf to remember the previous
name but instead use the previous ce" approach gave us a nice
performance boost; I wonder if we can build on that idea here?

One possible approach might be to create one ce per "block" in the
pre-scanning thread and use that ce as the "previous one" in the
per-thread data before spawning a worker.

> +static unsigned long load_cache_entries(struct index_state *istate,
> +			void *mmap, size_t mmap_size, unsigned long src_offset)
> +{
> +	struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> +	struct load_cache_entries_thread_data *data;
> +	int nr_threads, cpus, ce_per_thread;
> +	unsigned long consumed;
> +	int i, thread;
> +
> +	nr_threads = git_config_get_index_threads();
> +	if (!nr_threads) {
> +		cpus = online_cpus();
> +		nr_threads = istate->cache_nr / THREAD_COST;

Here, nr_threads could become 0 with a small index, but any value
below 2 makes us call load_all_cache_entries() by the main thread
(and the value of nr_thread is not used anyore), it is fine.  Of
course, forced test will set it to 2 so there is no problem, either.

OK.

> +	/* a little sanity checking */
> +	if (istate->name_hash_initialized)
> +		die("the name hash isn't thread safe");

If it is a programming error to call into this codepath without
initializing the name_hash, which I think is the case, this is
better done with BUG("").

The remainder of the patch looked good.  Thanks.