Re: [PATCH v10 13/14] convert: add filter.<driver>.process option

Jakub Narębski <jnareb@xxxxxxxxx> · Sat, 15 Oct 2016 21:42:00 +0200

W dniu 15.10.2016 o 16:45, Lars Schneider pisze:
>> On 12 Oct 2016, at 03:54, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>> W dniu 12.10.2016 o 00:26, Lars Schneider pisze: 
>>>> On 09 Oct 2016, at 01:06, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>>>>>
>>>
>>>>> After the filter started
>>>>> Git sends a welcome message ("git-filter-client"), a list of
>>>>> supported protocol version numbers, and a flush packet. Git expects
>>>>> +to read a welcome response message ("git-filter-server") and exactly
>>>>> +one protocol version number from the previously sent list. All further
>>>>> +communication will be based on the selected version. The remaining
>>>>> +protocol description below documents "version=2". Please note that
>>>>> +"version=42" in the example below does not exist and is only there
>>>>> +to illustrate how the protocol would look like with more than one
>>>>> +version.
>>>>> +
>>>>> +After the version negotiation Git sends a list of all capabilities that
>>>>> +it supports and a flush packet. Git expects to read a list of desired
>>>>> +capabilities, which must be a subset of the supported capabilities list,
>>>>> +and a flush packet as response:
>>>>> +------------------------
>>>>> +packet:          git> git-filter-client
>>>>> +packet:          git> version=2
>>>>> +packet:          git> version=42
>>>>> +packet:          git> 0000
>>>>> +packet:          git< git-filter-server
>>>>> +packet:          git< version=2
>>>>> +packet:          git> clean=true
>>>>> +packet:          git> smudge=true
>>>>> +packet:          git> not-yet-invented=true
>>>>> +packet:          git> 0000
>>>>> +packet:          git< clean=true
>>>>> +packet:          git< smudge=true
>>>>> +packet:          git< 0000
>>>>
>>>> WARNING: This example is different from description!!!
>>>
>>> Can you try to explain the difference more clearly? I read it multiple
>>> times and I think this is sound.
>>
>> I'm sorry it was *my mistake*.  I have read the example exchange wrong.
>>
>> On the other hand that means that I have other comment, which I though
>> was addressed already in v10, namely that not all exchanges ends with
>> flush packet (inconsistency, and I think a bit of lack of extendability).
> 
> Well, this part of the protocol is not supposed to be extensible because
> it is supposed to deal *only* with the version number. It needs to keep 
> the same structure to ensure forward and backward compatibility.
> 
> However, for consistency sake I will add a flush packet.

Thanks.  That is one thing I feel quite strongly about.

I can agree that extendability does not matter much here: we can always
change the version number.  But there might be some additional information
that filter process wants to send to Git in first exchange, and using
flush-terminated list here means that we don't need to change version
number, assuming that this additional information is advisory.

The consistency means in my opinion that it should be easier to implement
filter scripts.

>>>> In description above the example you have 4-part handshake, not 3-part;
>>>> the filter is described to send list of supported capabilities last
>>>> (a subset of what Git command supports).
>>>
>>> Part 1: Git sends a welcome message...
>>> Part 2: Git expects to read a welcome response message...
>>> Part 3: After the version negotiation Git sends a list of all capabilities...
>>> Part 4: Git expects to read a list of desired capabilities...
>>>
>>> I think example and text match, no?
>>
>> Yes, it does; as I have said already, I have misread the example. 
>>
>> Anyway, in some cases 4-way handshake, where Git sends list of
>> supported capabilities first, is better.  If the protocol has
>> to prepare something for each of capabilities, and perhaps check
>> those preparation status, it can do it after Git sends what it
>> could need, and before it sends what it does support.
>>
>> Though it looks a bit strange that client (as Git is client here)
>> sends its capabilities first...
> 
> Git tells the filter what it can do. Then the filter decides what
> features it supports. I would prefer to keep it that way as I don't
> see a strong advantage for the other way around.

I think the current order is good, no need to change it.
As I said it is better for Git to send capabilities first.

>>>> By the way, now I look at it, the argument for using the
>>>> "<capability>=true" format instead of "capability=<capability>"
>>>> (or "supported-command=<capability>") is weak.  The argument for
>>>> using "<variable>=<value>" to make it easier to implement parsing
>>>> is sound, but the argument for "<capability>=true" is weak.
>>>>
>>>> The argument was that with "<capability>=true" one can simply
>>>> parse metadata into hash / dictionary / hashmap, and choose
>>>> response based on that.  Hash / hashmap / associative array
>>>> needs different keys, so the reasoning went for "<capability>=true"
>>>> over "capability=<capability>"... but the filter process still
>>>> needs to handle lines with repeating keys, namely "version=<N>"
>>>> lines!
>>>>
>>>> So the argument doesn't hold water IMVHO, and we can choose
>>>> version which reads better / is more natural.
>>>
>>> I have to agree that "capability=<capability>" might read a
>>> little bit nicer. However, Peff suggested "<capability>=true" 
>>> as his preference and this is absolutely OK with me.
>>
>> From what I remember it was Peff stating that he thinks "<foo>=true"
>> is easier for parsing (it is, but we still need to support the harder
>> way parsing anyway), and offered that "<foo>" is good enough (if less
>> consistent).
>>
>>> I am happy to change that if a second reviewer shares your
>>> opinion.
>>
>> Also, with "capability=<foo>" we can be more self descriptive,
>> for example "supported-command=<foo>"; though "capability" is good
>> enough for me.
>>
>> For example
>>
>> packet:          git> wants=clean
>> packet:          git> wants=smudge
>> packet:          git> wants=size
>> packet:          git> 0000
>> packet:          git< supports=clean
>> packet:          git< supports=smudge
>> packet:          git< 0000
>>
>> Though coming up with good names is hard; and as I said "capability"
>> is good enough; OTOH with "smudge=true" etc. we don't need to come
>> up with good name at all... though I wonder if it is a good thing `\_o,_/
> 
> How about this (I borrowed these terms from contract negotiation)?
> 
> packet:          git> offers=clean
> packet:          git> offers=smudge
> packet:          git> offers=size
> packet:          git> 0000
> packet:          git< accepts=clean
> packet:          git< accepts=smudge
> packet:          git< 0000
> 
> @Peff: Would that be OK for you?

I don't feel strongly about it.  It could be "<capability>=true", it could
be "<capability>", it could be "capability=<capability>", it could be
something more descriptive.

I guess "<capability>=true" looks a bit strange (would it ever be there
"<capability>=false"?), but it is good enough for me.

One think we can all agree on is that each capability is to be send as
separate packets, and not as space or comma separated list in a single
packet (like for fetch / push).

>>>>> +------------------------
>>>>> +packet:          git< status=abort
>>>>> +packet:          git< 0000
>>>>> +------------------------
>>>>> +
>>>>> +After the filter has processed a blob it is expected to wait for
>>>>> +the next "key=value" list containing a command. Git will close
>>>>> +the command pipe on exit. The filter is expected to detect EOF
>>>>> +and exit gracefully on its own.
>>>>
>>>> Any "kill filter" solutions should probably be put here.
>>>
>>> Agreed. 
>>>
>>>> I guess
>>>> that filter exiting means EOF on its standard output when read
>>>> by Git command, isn't it?
>>>
>>> Yes, but at this point Git is not listening anymore.
>>
>> I think it might be good idea to have here the information about
>> what filter process should do if it needs maybe lengthy closing
>> process, to not hold/stop Git command or to not be killed.
> 
> I've added:
> 
> "Git will wait until the filter process has stopped."

Thanks.  Looks good for me.

I think any advices (like how to handle shutdown in filter without
blocking Git) could be added later, when we have some experience
making and using long-running multi-file filter drivers.

Thank you for your work on this series.
-- 
Jakub Narębski