Re: [PATCH v1] convert: add "status=delayed" to filter process protocol

Lars Schneider <larsxschneider@xxxxxxxxx> · Wed, 11 Jan 2017 10:43:55 +0100

> On 09 Jan 2017, at 00:42, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> 
> larsxschneider@xxxxxxxxx writes:
> 
>> From: Lars Schneider <larsxschneider@xxxxxxxxx>
>> 
>> Some `clean` / `smudge` filters might require a significant amount of
>> time to process a single blob. During this process the Git checkout
>> operation is blocked and Git needs to wait until the filter is done to
>> continue with the checkout.
>> 
>> Teach the filter process protocol (introduced in edcc858) to accept the
>> status "delayed" as response to a filter request. Upon this response Git
>> continues with the checkout operation and asks the filter to process the
>> blob again after all other blobs have been processed.
> 
> Hmm, I would have expected that the basic flow would become
> 
> 	for each paths to be processed:
> 		convert-to-worktree to buf
> 		if not delayed:
> 			do the caller's thing to use buf
> 		else:
> 			remember path
> 
> 	for each delayed paths:
> 		ensure filter process finished processing for path
> 		fetch the thing to buf from the process
> 		do the caller's thing to use buf
> 
> and that would make quite a lot of sense.  However, what is actually
> implemented is a bit disappointing from that point of view.  While
> its first part is the same as above, the latter part instead does:
> 
> 	for each delayed paths:
> 		checkout the path
> 
> Presumably, checkout_entry() does the "ensure that the process is
> done converting" (otherwise the result is simply buggy), but what
> disappoints me is that this does not allow callers that call
> "convert-to-working-tree", whose interface is obtain the bytestream 
> in-core in the working tree representation, given an object in the
> object-db representation in an in-core buffer, to _use_ the result
> of the conversion.  The caller does not have a chance to even see
> the result as it is written straight to the filesystem, once it
> calls checkout_delayed_entries().

I am not sure I can follow you here. A caller of "convert_to_working_tree"
would indeed see filtered result. Consider the following example. The 
filter delays the conversion twice and responds with the filtered results
on the third call:

CALL:     int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0)
RESPONSE: return == 1; *delayed == 1, *dst==''

CALL:     int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0)
RESPONSE: return == 1; *delayed == 1, *dst==''

CALL:     int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0)
RESPONSE: return == 1; *delayed == 0, *dst=='FILTERED_CONTENT'

I implemented the "checkout_delayed_entries" function in v1 because
it solved the problem with minimal changes in the existing code. Our previous 
discussion made me think that this is the preferred way:

     I do not think we want to see such a rewrite all over the
     codepaths.  It might be OK to add such a "these entries are known
     to be delayed" list in struct checkout so that the above becomes
     more like this:

       for (i = 0; i < active_nr; i++)
          checkout_entry(active_cache[i], state, NULL);
     + checkout_entry_finish(state);

     That is, addition of a single "some of the checkout_entry() calls
     done so far might have been lazy, and I'll give them a chance to
     clean up" might be palatable.  Anything more than that on the
     caller side is not.

c.f. http://public-inbox.org/git/xmqqvavotych.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/

Thanks,
Lars