Re: [PATCH v3 10/10] convert: add filter.<driver>.process option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Note that some of this might have been invalidated by v4]

W dniu 01.08.2016 o 15:32, Lars Schneider pisze:
>> On 31 Jul 2016, at 00:05, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>> W dniu 30.07.2016 o 01:38, larsxschneider@xxxxxxxxx pisze:

>>> Git starts the filter on first usage and expects a welcome
>>> message, protocol version number, and filter capabilities
>>> separated by spaces:
>>> ------------------------
>>> packet:          git< git-filter-protocol\n
>>> packet:          git< version 2\n
>>> packet:          git< capabilities clean smudge\n
>>
>> Sorry for going back and forth, but now I think that 'capabilities' are
>> not really needed here, though they are in line with "version" in
>> the second packet / line, namely "version 2".  If it does not make
>> parsing more difficult...
> 
> I don't understand what you mean with "they are not really needed"?
> The field is necessary to understand the protocol, no?
> 
> In the last roll I added the "key=value" format to the protocol upon
> yours and Peff's suggestion. Would it be OK to change the startup
> sequence accordingly?
> 
> packet:          git< version=2\n
> packet:          git< capabilities=clean smudge\n
 
With current implementation, Git checks second packet of the handshake
for version, and third packet for capabilities.  The "capabilities" or
"capabilities=" is entirely redundant; it is the position of packet
(it is the packet number) that matters.  At least for now.

The only thing that "version" in "version 2" and "capabilities"
in "capabilities: clean smudge" helps is self-describing of the protocol.

To really make use of them you would have to end handshake with flush
packet, and do a parsing: loop over every packet, and match known
patterns.  Well, perhaps with exception of known header: it doesn't
makes sense to have "version N" anywhere else than second packet,
and it doesn't makes sense to repeat it.

We also don't want to proliferate packets unnecessarily.  Each packet
is a bit (a tiny bit) of a performance hit.
 
>>> ------------------------
>>> Supported filter capabilities are "clean", "smudge", "stream",
>>> and "shutdown".
>>
>> I'd rather put "stream" and "shutdown" capabilities into separate
>> patches, for easier review.
> 
> I agree with "shutdown". I think I would like to remove the "stream"
> option and make it the default for the following reasons:
> 
> (1) As you mentioned elsewhere, "stream" is not really streaming at this
> point because we don't read/write in parallel.

We could, following the example of original per-file filter drivers.
It is as simple as starting writer using start_async(), as if we did
writing from Git in a child process.

Though that might be left for later (assuming that protocol is flexible
enough), as synchronous protocol (write, then read) is a bit simpler to
implement.

> (2) Junio and you pointed out that if we transmit size and flush packet
> then we have redundancy in the protocol.

Providing size upfront can be a hint for filter or Git.  For example
HTTP provides Content-Length: header, though it is not strictly necessary.

> (3) With the newly introduced "success"/"reject"/"failure" packet at the 
> end of a filter operation, a filter process has a way to signal Git that
> something went wrong. Initially I had the idea that a filter process just
> stops writing and Git would detect the mismatch between expected bytes
> and received bytes. But the final status packet is a much clearer solution.

The solution with stopping writing wouldn't work, I don't think.

> (4) Maintaining two slightly different protocols is a waste of resources 
> and only increases the size of this (already large) patch.

Right, better to design and implement basic protocol, taking care that
it is extensible, and only then add to it.

> My only argument for the size packet was that this allows efficient buffer
> allocation. However, in non of my benchmarks this was actually a problem.
> Therefore this is probably a epsilon optimization and should be removed.
> 
> OK with everyone?

All right.

>>> After the filter has processed a blob it is expected to wait for
>>> the next command. A demo implementation can be found in
>>> `t/t0021/rot13-filter.pl` located in the Git core repository.
>>
>> If filter does not support "shutdown" capability (or if said
>> capability is postponed for later patch), it should behave sanely
>> when Git command reaps it (SIGTERM + wait + SIGKILL?, SIGCHLD?).
> 
> How would you do this? Don't you think the current solution is
> good enough for processes that don't need a proper shutdown?

Actually... couldn't filter driver register atexit() / signal handler
to do a clean exit, if it is needed?
 
 
>> I wonder if it would be worth it to explain the reasoning behind
>> this solution and show alternate ones.
>>
>> * Using a separate variable to signal that filters are invoked
>>   per-command rather than per-file, and use pkt-line interface,
>>   like boolean-valued `useProtocol`, or `protocolVersion` set
>>   to '2' or 'v2', or `persistence` set to 'per-command', there
>>   is high risk of user's trying to use exiting one-shot per-file
>>   filters... and Git hanging.
>>
>> * Using new variables for each capability, e.g. `processSmudge`
>>   and `processClean` would lead to explosion of variable names;
>>   I think.
>>
>> * Current solution of using `process` in addition to `clean`
>>   and `smudge` clearly says that you need to use different
>>   command for per-file (`clean` and `smudge`), and per-command
>>   filter, while allowing to use them together.
>>
>>   The possible disadvantage is Git command starting `process`
>>   filter, only to see that it doesn't offer required capability,
>>   for example offering only "clean" but not "smudge".  There
>>   is simple workaround - set `smudge` variable (same as not
>>   present capability) to empty string.
> 
> If you think it is necessary to have this discussion in the
> commit message, then I will add it.

I think it would be good idea (not necessary, but helpful), though
possibly not in such exacting detail.  Just why this one, one sentence
or more.
 
 >>> +single filter invocation for the entire life of a single Git
>>> +command. This is achieved by using the following packet
>>> +format (pkt-line, see protocol-common.txt) based protocol over
>>
>> Can we linkgit-it (to technical documentation)?
> 
> I don't think that is possible because it was never done. See:
> git grep "linkgit:tech"

A pity.  Well, not your problem, anyway.

>>> +Git starts the filter on first usage and expects a welcome
>>
>> Is "usage" here correct?  Perhaps it would be more readable
>> to say that Git starts filter when encountering first file
>> that needs cleaning or smudgeing.
> 
> OK. How about this:
> 
> Git starts the filter when it encounters the first file
> that needs to be cleaned or smudged. After the filter started
> Git expects a welcome message, protocol version number, and 
> filter capabilities separated by spaces:

Better. 

>>> +
>>> +After the filter has processed a blob it is expected to wait for
>>> +the next command. A demo implementation can be found in
>>> +`t/t0021/rot13-filter.pl` located in the Git core repository.
>>
>> It is actually in Git sources.  Is it the best way to refer to
>> such files?
> 
> Well, I could add a github.com link but I don't think everyone
> would like that. What would you suggest?

Sorry, I wasn't clear.  What I meant is if "<file> located in the
Git core repository" is the best way to refer to such files, and
if we could do better.

But I think it is all right as it is.

Later we might want to provide some example filter.<driver>.process
filters e.g. in contrib/.  But that's for the future.
 
>>> +
>>> +Please note that you cannot use an existing filter.<driver>.clean
>>> +or filter.<driver>.smudge command as filter.<driver>.process
>>> +command. As soon as Git would detect a file that needs to be
>>> +processed by this filter, it would stop responding.
>>
>> This isn't.
> 
> Would that be better?
> 
> 
> Please note that you cannot use an existing `filter.<driver>.clean`
> or `filter.<driver>.smudge` command as `filter.<driver>.process`
> command because the former two use a different inter process 
> communication protocol than the latter one. As soon as Git would detect 
> a file that needs to be processed by such an invalid "process" filter, 
> it would wait for a proper protocol handshake and appear "hanging".

This is better.

-- 
Jakub Narębski

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]