Re: RFC: Enable delayed responses to Git clean/smudge filter requests

Lars Schneider <larsxschneider@xxxxxxxxx> · Wed, 16 Nov 2016 10:53:17 +0100

On 15 Nov 2016, at 19:03, Junio C Hamano <gitster@xxxxxxxxx> wrote:

> Lars Schneider <larsxschneider@xxxxxxxxx> writes:
> 
>>> The filter itself would need to be aware of parallelism
>>> if it lives for multiple objects, right?
>> 
>> Correct. This way Git doesn't need to deal with threading...
> 
> I think you need to be careful about three things (at least; there
> may be more):
> 
> * Codepaths that check out multiple cache entries do rely on the
>   order of checkout.  We checkout removals first to make room so
>   that creation of a path X can succeed if an existing path X/Y
>   that used to want to see X as a directory can succeed (see the
>   use of checkout_entry() by "git checkout", which does have two
>   separate loops to explicitly guarantee this), for example.  I
>   think "remove all and then create" you do not specifically have
>   to worry about with the proposed change, but you may need to
>   inspect and verify there aren't other kind of order dependency.

OK

> * Done naively, it will lead to unmaintainable code, like this:
> 
>   + struct list_of_cache_entries *list = ...;
>     for (i = 0; i < active_nr; i++)
>   -    checkout_entry(active_cache[i], state, NULL);
>   +    if (checkout_entry(active_cache[i], state, NULL) == DELAYED)
>   +       add_cache_to_queue(&list, active_cache[i]);
>   + while (list) {
>   +    wait_for_checkout_to_finish(*list);
>   +    list = list->next;
>   + }
> 
>   I do not think we want to see such a rewrite all over the
>   codepaths.  It might be OK to add such a "these entries are known
>   to be delayed" list in struct checkout so that the above becomes
>   more like this:
> 
>     for (i = 0; i < active_nr; i++)
>        checkout_entry(active_cache[i], state, NULL);
>   + checkout_entry_finish(state);
> 
>   That is, addition of a single "some of the checkout_entry() calls
>   done so far might have been lazy, and I'll give them a chance to
>   clean up" might be palatable.  Anything more than that on the
>   caller side is not.

I haven't thought hard about the implementation, yet, but I'll try 
to stick to your suggestion and change as less code as possible on 
the caller sides.

> * You'd need to rein in the maximum parallelism somehow, as you do
>   not want to see hundreds of competing filter processes starting
>   only to tell the main loop over an index with hundreds of entries
>   that they are delayed checkouts.

I intend to implement this feature only for the new long running filter
process protocol. OK with you?

Thanks,
Lars