> On 09 Jan 2017, at 00:42, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > larsxschneider@xxxxxxxxx writes: > >> From: Lars Schneider <larsxschneider@xxxxxxxxx> >> >> Some `clean` / `smudge` filters might require a significant amount of >> time to process a single blob. During this process the Git checkout >> operation is blocked and Git needs to wait until the filter is done to >> continue with the checkout. >> >> Teach the filter process protocol (introduced in edcc858) to accept the >> status "delayed" as response to a filter request. Upon this response Git >> continues with the checkout operation and asks the filter to process the >> blob again after all other blobs have been processed. > > Hmm, I would have expected that the basic flow would become > > for each paths to be processed: > convert-to-worktree to buf > if not delayed: > do the caller's thing to use buf > else: > remember path > > for each delayed paths: > ensure filter process finished processing for path > fetch the thing to buf from the process > do the caller's thing to use buf > > and that would make quite a lot of sense. However, what is actually > implemented is a bit disappointing from that point of view. While > its first part is the same as above, the latter part instead does: > > for each delayed paths: > checkout the path > > Presumably, checkout_entry() does the "ensure that the process is > done converting" (otherwise the result is simply buggy), but what > disappoints me is that this does not allow callers that call > "convert-to-working-tree", whose interface is obtain the bytestream > in-core in the working tree representation, given an object in the > object-db representation in an in-core buffer, to _use_ the result > of the conversion. The caller does not have a chance to even see > the result as it is written straight to the filesystem, once it > calls checkout_delayed_entries(). I am not sure I can follow you here. A caller of "convert_to_working_tree" would indeed see filtered result. Consider the following example. The filter delays the conversion twice and responds with the filtered results on the third call: CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0) RESPONSE: return == 1; *delayed == 1, *dst=='' CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0) RESPONSE: return == 1; *delayed == 1, *dst=='' CALL: int convert_to_working_tree(*src=='CONTENT', *dst, *delayed==0) RESPONSE: return == 1; *delayed == 0, *dst=='FILTERED_CONTENT' I implemented the "checkout_delayed_entries" function in v1 because it solved the problem with minimal changes in the existing code. Our previous discussion made me think that this is the preferred way: I do not think we want to see such a rewrite all over the codepaths. It might be OK to add such a "these entries are known to be delayed" list in struct checkout so that the above becomes more like this: for (i = 0; i < active_nr; i++) checkout_entry(active_cache[i], state, NULL); + checkout_entry_finish(state); That is, addition of a single "some of the checkout_entry() calls done so far might have been lazy, and I'll give them a chance to clean up" might be palatable. Anything more than that on the caller side is not. c.f. http://public-inbox.org/git/xmqqvavotych.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/ Thanks, Lars