Re: [PATCH v3 3/3] index-pack: support multithreaded delta resolving

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 03 May 2012 23:21:54 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Nguyễn Thái Ngọc Duy  <pclouds@xxxxxxxxx> writes:
>
>> @@ -696,7 +796,31 @@ static void second_pass(struct object_entry *obj)
>>  	base_obj->obj = obj;
>>  	base_obj->data = NULL;
>>  	find_unresolved_deltas(base_obj);
>> -	display_progress(progress, nr_resolved_deltas);
>> +}
>> +
>> +static void *threaded_second_pass(void *arg)
>> +{
>> +#ifndef NO_PTHREADS
>> +	if (threads_active)
>> +		pthread_setspecific(key, arg);
>> +#endif
>> +	for (;;) {
>> +		int i;
>> +		work_lock();
>> +		display_progress(progress, nr_resolved_deltas);
>> +		while (nr_processed < nr_objects &&
>> +		       is_delta_type(objects[nr_processed].type))
>> +			nr_processed++;
>> +		if (nr_processed >= nr_objects) {
>> +			work_unlock();
>> +			break;
>> +		}
>> +		i = nr_processed++;
>> +		work_unlock();
>> +
>> +		second_pass(&objects[i]);
>> +	}
>> +	return NULL;
>>  }
>
> It may be just the matter of naming, but the above is taking the
> abstraction backwards, I think.  Shouldn't it be structured in such a way
> that the caller may call second_pass() and its implementation may turn out
> to be threaded (or not)?
>
> The naming of "arg" made things worse.  I wasted 5 minutes scratching my
> head thinking "arg" was a single specific object that was to be given to
> second_pass(), and wondered why it is made into thread-local data.  Name
> it "thread_data" or something.
>
> And I think the root cause of this confusion is the way "second_pass" was
> split out in the earlier patch.  It is not the entire second-pass, but is
> merely a single step of it (the whole "for (i = 0; i < nr_objects; i++)"
> is the second-pass, in other words), and its implementation detail might
> change to either thread (i.e. instead of a single line of control
> iterating from 0 to nr_objects, each thread grab the next available task
> and work on it until everything is exhausted) or not.
>
> By the way, if one object is very heavy and takes a lot of time until
> completion, could it be possible that objects[0] is still being processed
> for its base data but objects[1] has already completed and an available
> thread could work on objects[2]?  How does it learn to process objects[2]
> in such a case, or does it wait until the thread working on objects[0] is
> done?

Please disregard the "By the way" part, except that my confusion that led
to the "By the way" comment was caused by another misnaming, namely,
"nr_processed".  It is not counting "How many of them have we already
processed?"---it merely counts "How many of them have we dispatched?" and
completion of the task does not matter in this critical section, which I
missed.  If it were named "nr_dispatched", I wouldn't have wasted my time
wondering about the loop and writing the "By the way" review comment.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html