Re: [PATCH v2 7/7] index-pack: make quantum of work smaller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019.10.17 13:17, Jonathan Tan wrote:
> Currently, when index-pack resolves deltas, it does not split up delta
> trees into threads: each delta base root (an object that is not a
> REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
> that root (direct or indirect) are processed in the same thread.
> 
> This is a problem when a repository contains a large text file (thus,
> delta-able) that is modified many times - delta resolution time during
> fetching is dominated by processing the deltas corresponding to that
> text file.
> 
> This patch contains a solution to that. When cloning using
> 
>   git -c core.deltabasecachelimit=1g clone \
>     https://fuchsia.googlesource.com/third_party/vulkan-cts
> 
> on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
> which is the default).
> 
> The solution is to have a global work stack. This stack contains delta
> bases (objects, whether appearing directly in the packfile or generated
> by delta resolution, that themselves have delta children) that need to
> be processed; whenever a thread needs work, it peeks at the top of the
> stack and processes its next unprocessed child. If a thread finds the
> stack empty, it will look for more delta base roots to push on the stack
> instead.
> 
> The main weakness of having a global work stack is that more time is
> spent in the mutex, but profiling has shown that most time is spent in
> the resolution of the deltas themselves, so this shouldn't be an issue
> in practice. In any case, experimentation (as described in the clone
> command above) shows that this patch is a net improvement.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> ---
>  builtin/index-pack.c | 336 ++++++++++++++++++++++++-------------------
>  1 file changed, 190 insertions(+), 146 deletions(-)
> 
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 31607a77fc..072592a35d 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -38,15 +38,49 @@ struct base_data {
>  	struct object_entry *obj;
>  	int ref_first, ref_last;
>  	int ofs_first, ofs_last;
> +	/*
> +	 * Threads should increment retain_data if they are about to call
> +	 * patch_delta() using this struct's data as a base, and decrement this
> +	 * when they are done. While retain_data is nonzero, this struct's data
> +	 * will not be freed even if the delta base cache limit is exceeded.
> +	 */
> +	int retain_data;
> +	/*
> +	 * The number of direct children that have not been fully processed
> +	 * (entered work_head, entered done_head, left done_head). When this
> +	 * number reaches zero, this struct base_data can be freed.
> +	 */
> +	int children_remaining;
>  
>  	/* Not initialized by make_base(). */
> +	struct list_head list;
>  	void *data;
>  	unsigned long size;
>  };

As a newb, the addition of a field just called "list" struck me as
needing a more descriptive name. But this makes more sense now after
looking at list.h. And it seems that most uses in other parts of the
code are similarly just named "list".


> +/*
> + * Stack of struct base_data that have unprocessed children.
> + * threaded_second_pass() uses this as a source of work (the other being the
> + * objects array).
> + */
> +LIST_HEAD(work_head);
> +
> +/*
> + * Stack of struct base_data that have children, all of whom have been
> + * processed or are being processed, and at least one child is being processed.
> + * These struct base_data must be kept around until the last child is
> + * processed.
> + */
> +LIST_HEAD(done_head);
> +
> +/*
> + * All threads share one delta base cache.
> + */
> +size_t base_cache_used;
> +size_t base_cache_limit;
> +
>  struct thread_local {
>  	pthread_t thread;
> -	size_t base_cache_used;
>  	int pack_fd;
>  };

Seeing several new shared variables made me paranoid about whether or
not these were thread-safe. There are already five mutexes in this file,
but there are no clear descriptions (apart from hints in the lock names)
about which locks protect which variables. It would be nice to make this
a bit more explicit.

I wish we had something similar to Abseil's "GUARDED_BY" annotations [1]
so that we could ensure thread safety with static analysis, but that's
obviously out of scope for this series.

[1]: https://abseil.io/docs/cpp/guides/synchronization#thread-annotations


> @@ -920,6 +948,8 @@ static struct base_data *make_base(struct object_entry *obj,
>  				&base->ref_first, &base->ref_last);
>  	find_ofs_delta_children(obj->idx.offset,
>  				&base->ofs_first, &base->ofs_last);
> +	base->children_remaining = base->ref_last - base->ref_first +
> +		base->ofs_last - base->ofs_first + 2;
>  	return base;
>  }

The extra +2 here confused me for a bit, but it makes sense now. You're
subtracting indices to get the length, so you want +1 to make the math
work out, and you're doing it across two different lists. In the edge
case where a list is empty, "first" is 0 and "last" is -1, so the math
still works out.


> @@ -1065,33 +1011,128 @@ static int compare_ref_delta_entry(const void *a, const void *b)
[snip]
>  		work_unlock();
>  
> -		resolve_base(&objects[i]);
> +		if (parent) {
> +			child = resolve_delta(child_obj, parent);
> +			if (!child->children_remaining)
> +				FREE_AND_NULL(child->data);
> +		} else {
> +			child = make_base(child_obj, NULL);
> +			if (child->children_remaining) {
> +				/*
> +				 * Since this child has its own delta children,
> +				 * we will need this data in the future.
> +				 * Inflate now so that future iterations will
> +				 * have access to this object's data while
> +				 * outside the work mutex.
> +				 */
> +				child->data = get_data_from_pack(child_obj);
> +				child->size = child_obj->size;
> +			}
> +		}
> +
> +		work_lock();

This part made me uneasy at first, because it looks like we might be
doing work on shared data outside of the work lock. But just prior to
unlocking, we parent->retain_data, so we know it won't be freed. And the
only part we modify is child, which I don't believe is shared with any
other threads.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux