Re: [PATCH v2 0/5] Git filter protocol

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



W dniu 2016-07-28 o 09:16, Lars Schneider pisze:
>> On 27 Jul 2016, at 21:08, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>> W dniu 2016-07-27 o 02:06, larsxschneider@xxxxxxxxx pisze:
>>>
>>> thanks a lot for the extensive reviews. I tried to address all mentioned
>>> concerns and summarized them below. The most prominent changes since v1 are
>>> the following:
>>> * Git offers a number of filter capabilities that a filter can request
>>>  (right now only "smudge" and "clean" - in the future maybe "cleanFromFile",
>>>  "smudgeToFile", and/or "stream")
>>> * pipe communication uses a packet format (pkt-line) based protocol
>>
>> I wonder if it would make sense to support both whole-file pipe communication,
>> and packet format (pkt-line) pipe communication.
>>
>> The problem with whole-file pipe communication (original proposal for
>> new filter protocol is that it needs file size upfront.  For some types
>> of filters it is not a problem:
>> - if a filtered file has the same size as original, like for rot13
>>   example in the test for the feature
>> - if you can calculate the resulting file size from original size,
>>   like for most if not all encryption formats (that includes GPG,
>>   uudecode, base64, quoted-printable, hex, etc.); same for decryption,
>>   and from converting between fixed-width encodings
>> - if resulting file size is saved somewhere that is easy to get, like
>>   for LFS solutions (I think).
>>
>> For other filters it is serious problem.  For example indent, keyword
>> expansion, rezipping with zero compression (well, maybe not this one,
>> but at least the reverse of it), converting between encodings where
>> at least one is variable width (like UTF-8),...
>>
>> IMHO writing whole-file persistent filters is easier than using pkt-line.
>> On the other hand using pkt-line allow for more detailed progress report.
> 
> I initially wanted to support only "while-file" pipe, too.
> 
> But Peff ($gmane/299902), Duy, and Eric, seemed to prefer the pkt-line
> solution (gmane is down - otherwise I would have given you the links).

As GMane is down (at least the web interface; NNTP seems to be running)
I cannot examine their arguments.  Could you summarize?

> 
> After I have looked at it I think the pkt-line solution is indeed nicer
> for the following reasons:
> 
> (1) A stream optimized version (read/write in separate threads) of the filter
>     protocol can be implemented in the future without changing the protocol

I think the more important thing is that with pkt-line the filter does
not need to know the size of the output upfront.  Separate threads are
independent of protocol used, I think; and anyway Git never writes to
filter and reads from filter in the same command, isn't it?  The lifetime
of filter driver command is one Git command for now.

Oh, you meant having separate threads for writing to filter, and
separate thread for receiving output, so you don't have to wait to
send whole file to filter before starting receiving?  Note that
I think original filter implementation does it; at least async_start()
used in it hints about that (I need to examine how it works to tell
more).

> (2) pkt-line is a simple and easy to implement format

But it is more complicated than whole-file based protocol.  You need
to loop over packets... well you need that tool with whole-file, but
it is covered by existing helper functions (read_in_full()).  It is
easy to redirect file descriptors (copy_fd()), while you need to
convert contents into packets on write side, and unpack and unsplit
on the receive side in Git.

You also need to take care documenting if trailing "\n\0", "\n", "\0"
is a part of packet.

> (3) Reuse of existing Git communication infrastructure
>     -> code and documentation are less surprising to people that know Git

Whole-file read is not that difficult...

>     -> you can use GIT_TRACE_PACKET to easily inspect the
>        communication between Git and the filter process

...but this is a nice advantage.

If deemed necessary, we could also reuse progress report meters from
fetch / push side (percent, bandwidth/throughput), I guess.

> (4) The overheads is neglect able (4 byte header vs 65516 byte content)

Right.

> 
> 
>>> * a long running filter application is defined with "filter.<driver>.process"
>>
>> I hope that won't confuse Git users into trying to use single-shot
>> filters with a new protocol...
> 
> Yes, that was my intention for this new config.

All right, but you need to document the precedence rules: that is 

(1) if, accordingly to the operation, `clean` or `smudge` per-file filter
    is present, it is used; 
(2) if `process` semi-persistent protocol based filter is present,
    and it offers, accordingly to the operation, `clean` or `smudge`
    capabilities, it is used;
(3) otherwise, no filtering is performed.  

`clean` or `smudge` set to empty string means identity filter; I don't know
about rule for the new `process` filter if it is set to empty string.

At least in the commit message you would need to describe why this particular
solution was chosen.  I guess it is to avoid starting `protocol` filter,
only to realize that it does not offer "smudge" capability, and that `smudge`
filter is to be used because it is set.

Best,
-- 
Jakub Narębski

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]