Re: [PATCH v3 10/10] convert: add filter.<driver>.process option

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Excuse me replying to myself, but there are a few things I forgot,
 or realized only later]

W dniu 31.07.2016 o 00:05, Jakub Narębski pisze:
> W dniu 30.07.2016 o 01:38, larsxschneider@xxxxxxxxx pisze:
>> From: Lars Schneider <larsxschneider@xxxxxxxxx>
>>
>> Git's clean/smudge mechanism invokes an external filter process for every
>> single blob that is affected by a filter. If Git filters a lot of blobs
>> then the startup time of the external filter processes can become a
>> significant part of the overall Git execution time.
>>
>> This patch adds the filter.<driver>.process string option which, if used,
>> keeps the external filter process running and processes all blobs with
>> the following packet format (pkt-line) based protocol over standard input
>> and standard output.
> 
> I think it would be nice to have here at least summary of the benchmarks
> you did in https://github.com/github/git-lfs/pull/1382

Note that this feature is especially useful if startup time is long,
that is if you are using an operating system with costly fork / new process
startup time like MS Windows (which you have mentioned), or writing
filter in a programming language with large startup time like Java
or Python (the latter may have changed since).

  https://gnustavo.wordpress.com/2012/06/28/programming-languages-start-up-times/

[...]
> I was thinking about having possible responses to receiving file
> contents (or starting receiving in the streaming case) to be:
> 
>   packet:          git< ok size=7\n    (or "ok 7\n", if size is known)
> 
> or
> 
>   packet:          git< ok\n           (if filter does not know size upfront)
> 
> or
> 
>   packet:          git< fail <msg>\n   (or just "fail" + packet with msg)
> 
> The last would be when filter knows upfront that it cannot perform
> the operation.  Though sending an empty file with non-"success" final
> would work as well.

[...]

>> In case the filter cannot process the content, it is expected
>> to respond with the result content size 0 (only if "stream" is
>> not defined) and a "reject" packet.
>> ------------------------
>> packet:          git< size=0\n    (omitted with capability "stream")
>> packet:          git< reject\n
>> ------------------------
> 
> This is *wrong* idea!  Empty file, with size=0, can be a perfectly
> legitimate response.  

Actually, I think I have misunderstood your intent.  If you want to have
simpler protocol, with only one place to signal errors, that is after
sending a response, then proper way of signaling the error condition
would be to send empty file and then "reject" instead of "success":

   packet:          git< size=0\n    (omitted with capability "stream")
   packet:          git< 0000        (we need this flush packet)
   packet:          git< reject\n

Otherwise in the case without size upfront (capability "stream")
file with contents "reject" would be mistaken for the "reject" packet.

See below for proposal with two places to signal errors: before sending
first byte, and after.


NOTE: there is a bit of mixed and possibly confusing notation, that
is 0000 is flush packet, not packet with 0000 as content.  Perhaps
write pkt-line in full?


[...]
>> ---
>>  Documentation/gitattributes.txt |  84 ++++++++-
>>  convert.c                       | 400 +++++++++++++++++++++++++++++++++++++--
>>  t/t0021-conversion.sh           | 405 ++++++++++++++++++++++++++++++++++++++++
>>  t/t0021/rot13-filter.pl         | 177 ++++++++++++++++++
>>  4 files changed, 1053 insertions(+), 13 deletions(-)
>>  create mode 100755 t/t0021/rot13-filter.pl

Wouldn't it be better for easier review to split it into separate patches?
Perhaps at least the new test...

[...]
> I would assume that we have two error conditions.  
> 
> First situation is when the filter knows upfront (after receiving name
> and size of file, and after receiving contents for not-streaming filters)
> that it cannot process the file (like e.g. LFS filter with artifactory
> replica/shard being a bit behind master, and not including contents of
> the file being filtered).
> 
> My proposal is to reply with "fail" _in place of_ size of reply:
> 
>    packet:         git< fail\n       (any case: size known or not, stream or not)
> 
> It could be "reject", or "error" instead of "fail".
> 
> 
> Another situation is if filter encounters error during output,
> either with streaming filter (or non-stream, but not storing whole
> input upfront) realizing in the middle of output that there is something
> wrong with input (e.g. converting between encoding, and encountering
> character that cannot be represented in output encoding), or e.g. filter
> process being killed, or network connection dropping with LFS filter, etc.
> The filter has send some packets with output already.  In this case
> filter should flush, and send "reject" or "error" packet.
> 
>    <error condition>
>    packet:         git< "0000"       (flush packet)
>    packet:         git< reject\n
> 
> Should there be a place for an error message, or would standard error
> (stderr) be used for this?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]