Re: [PATCH v2] convert: add "status=delayed" to filter process protocol

Lars Schneider <larsxschneider@xxxxxxxxx> · Sun, 9 Apr 2017 20:41:02 +0200

> On 27 Feb 2017, at 23:11, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
> 
> W dniu 27.02.2017 o 11:32, Lars Schneider pisze:
>> 
>>> On 27 Feb 2017, at 10:58, Jeff King <peff@xxxxxxxx> wrote:
>>> 
>>> On Sun, Feb 26, 2017 at 07:48:16PM +0100, Lars Schneider wrote:
>>> 
>>>> +If the request cannot be fulfilled within a reasonable amount of time
>>>> +then the filter can respond with a "delayed" status and a flush packet.
>>>> +Git will perform the same request at a later point in time, again. The
>>>> +filter can delay a response multiple times for a single request.
>>>> +------------------------
>>>> +packet:          git< status=delayed
>>>> +packet:          git< 0000
>>>> +------------------------
> 
> Is it something that happens instead of filter process sending the contents

Correct! I'll clarify this in v3!

>> 
>> I completely agree - I need to change that. However, the goal of the v2
>> iteration was to get the "convert" interface in an acceptable state.
>> That's what I intended to say in the patch comment section:
>> 
>>    "Please ignore all changes behind async_convert_to_working_tree() and 
>>     async_filter_finish() for now as I plan to change the implementation 
>>     as soon as the interface is in an acceptable state."
> 
> I think that it is more important to start with a good abstraction,
> and the proposal for protocol, rather than getting bogged down in
> implementation details that may change as the idea for protocol
> extension changes.

I'll send out v3 shortly as proposal for a complete solution.

>>> I think it would be much more efficient to do something like:
>>> 
>>> [Git issues a request and gives it an opaque index id]
>>> git> command=smudge
>>> git> pathname=foo
>>> git> index=0
>>> git> 0000
>>> git> CONTENT
>>> git> 0000
>>> 
>>> [The data isn't ready yet, so the filter tells us so...]
>>> git< status=delayed
>>> git< 0000
> 
> So is it only as replacement for "status=success" + contents or
> "status=abort", that is upfront before sending any part of the file?

Yes.

> Or, as one can assume from the point of the paragraph with the
> "status=delayed", it is about replacing null list for success or
> "status=error" after sending some part (maybe empty) of a file,
> that is:

No. As this would complicate things I don't want to support it. 
(and I clarified that in the docs in v3).

> If it would not be undue burden on the filter driver process, we might
> require for it to say where to continue at (in bytes), e.g.
> 
>    git< from=16426
> 
> That should, of course, go below index/pathname line.

This would make the protocol even more complicated. That's why I don't
want to support splitting the response.

>>> git< index=0
> 
> Or a filter driver could have used pathname as an index, that is
> 
>    git< pathname=path/testfile.dat

In v3 I've used an index to help Git finding the right cache entry
quickly.

> 
>>> git< 0000
>>> git< CONTENT
>>> git< 0000
>>> 
>>> From Git's side, the loop is something like:
>>> 
>>> while (delayed_items > 0) {
>>> 	/* issue a wait, and get back the status/index pair */
>>> 	status = send_wait(&index);
>>> 	delayed_items--;
> 
> This looks like my 'event loop' proposal[1][2], see below.

I implemented something similar in v3.

>> That could work! I had something like that in mind:
>> 
>> I teach Git a new command "list_completed" or similar. The filter
>> blocks this call until at least one item is ready for Git. 
>> Then the filter responds with a list of paths that identify the
>> "ready items". Then Git asks for these ready items just with the
>> path and not with any content. Could that work? Wouldn't the path
>> be "unique" to identify a blob per filter run?
> 
> Why in the "drain" phase it is still Git that needs to ask filter for
> contents, one file after another?  Wouldn't it be easier and simpler
> for filter to finish sending contents, and send signal that it has
> finished continue'ing?
> 
> To summarize my earlier emails, current proposal looks for me as if
> it were a "busy loop" solution, that is[2]:

In v3 the implementation still uses kind of a busy loop (I expect the
filter to block if there nothing ready, yet). An event loop would
complicate the protocol as the filter would need to initiate an action.
Right now only Git initiates actions.

> Footnotes:
> ----------
> a) We don't send the Git-side contents of blob again, isn't it?
>   So we need some protocol extension / new understanding anyway.
>   for example that we don't send contents if we request path again.
Correct - v3 doesn't send the content again.

> Also, one thing that we need to be solved, assuming that the proposed
> extension allows to send partial data from filter to be delayed and
> continued later, is that Git needs to keep this partial response in buf;
> this is because of precedence of gitattributes applying:

As mentioned above I don't want to support partial data as this
complicates things and is of no use for my Git LFS problem case.

Thanks,
Lars