Re: [PATCH v3 10/10] convert: add filter.<driver>.process option

Jakub Narębski <jnareb@xxxxxxxxx> · Mon, 1 Aug 2016 00:59:31 +0200

W dniu 31.07.2016 o 21:49, Lars Schneider pisze: 
> On 31 Jul 2016, at 11:42, Jakub Narębski <jnareb@xxxxxxxxx> wrote:
>> W dniu 31.07.2016 o 00:05, Jakub Narębski pisze:
>>> W dniu 30.07.2016 o 01:38, larsxschneider@xxxxxxxxx pisze:
[...]
>>> I think it would be nice to have here at least summary of the benchmarks
>>> you did in https://github.com/github/git-lfs/pull/1382

This would be nice to have in the commit message: real benchmarks.

>>
>> Note that this feature is especially useful if startup time is long,
>> that is if you are using an operating system with costly fork / new process
>> startup time like MS Windows (which you have mentioned), or writing
>> filter in a programming language with large startup time like Java
>> or Python (the latter may have changed since).
>>
>>  https://gnustavo.wordpress.com/2012/06/28/programming-languages-start-up-times/
> 
> OK, I will add this. Is it OK to add the link to the commit message?
> (since I don't know how long the link will be available).

I don't think it is needed.  Perhaps only a sentence or half to notify
where you could get most from this feature, but even then it is not
necessary.

I'm sorry for the confusion.

>> See below for proposal with two places to signal errors: before sending
>> first byte, and after.
> 
> Right now the protocol is implemented covering the following cases:
> 
> ## CASE 1 - no stream success

It is less "stream", more "size unknown".  Real streaming is interleaving
reading and writing, which is currently not supported due to lack of
start_async() - I think.

> 
> packet:          git< size=57\n
> packet:          git< SMUDGED_CONTENT
> packet:          git< 0000
> packet:          git< success\n

Right.  What happens if either length(SMUDGED_CONTENT) < size,
or length(SMUDGED_CONTENT) > size?  It could conceivably happen,
e.g. due to an error in size calculation.

NOTE that without using flush packet to signal end of contents,
we would be not able to signal a situation when filter encounters
an error (per-file, or long temporary) when it have written some
content already.  For example this may happen for git-LFS filter,
if the server hosting artifactory (or even whole network) gets
down during cleanup / smudging.

Well, unless we would use other special packets:
 - empty packet, that is "0004" pkt-line
 - invalid packet, that is "0001", "0002", "0003" pkt-line
to signal premature end of SMUDGED_CONTENT.

> 
> 
> ## CASE 2 - no stream success but 0 byte response
> 
> packet:          git< size=0\n
> packet:          git< success\n

Why there is need to special case 0 byte (empty file) response?

  packet:          git< size=0\n
  packet:          git< 0000
  packet:          git< success\n

is perfectly fine.

> ## CASE 3 - no stream filter; filter doesn't want to process the file
> 
> packet:          git< size=0\n
> packet:          git< reject\n

Why not simply

  packet:          git< reject\n

Or, if we are going success/reject/whatever route

  packet:          git< size=0\n
  packet:          git< 0000
  packet:          git< reject\n

> ## CASE 4 - no stream filter; filter error
> 
> packet:          git< size=57\n
> packet:          git< SMUDGED_CONTENT
> packet:          git< 0000
> packet:          git< error\n
> 
> CASE 4 is not explicitly checked. If a final message is neither
> "success" nor "reject" then it is interpreted as error. If that
> happens then Git will shutdown and restart the filter process
> if there is another file to filter. 

This should be documented.

> 
> Alternatively a filter process can shutdown itself, too, to signal
> an error.
> 
> The corresponding stream filter look like this:
> 
> ## CASE 1 - stream success
> 
> packet:          git< SMUDGED_CONTENT
> packet:          git< 0000
> packet:          git< success\n
> 
> 
> ## CASE 2 - stream success but 0 byte response
> 
> packet:          git< 0000
> packet:          git< success\n
> 
> 
> ## CASE 3 - stream filter; filter doesn't want to process the file
> 
> packet:          git< 0000
> packet:          git< reject\n
> 
> 
> ## CASE 4 - stream filter; filter error
> 
> packet:          git< SMUDGED_CONTENT
> packet:          git< 0000
> packet:          git< error\n
> 
> --
> 
> I just realized that the size 0 case is a bit inconsistent
> in the no stream case as it has no flush packet. Maybe I 
> should indeed remove the flush packet in the no stream case
> completely?!

That's what I wrote about SPOT (single point of truth), of using
either size or flush packet, but not both.  But...

As I wrote, you need some mechanism to signal premature end
of contents, and start of an error description.

> 
> Do the cases above make sense to you?

Except for the inconsistency of the size 0 case.  This what
I meant to say.

> 
> Regarding error handling. I would prefer it if the filter prints
> all errors to STDERR by itself. I think that is the safest
> option to communicate errors to the users because if the communication
> got into a bad state then Git might not be able to read the errors
> properly.
> 
> See Peff's response on the topic, too:
> http://public-inbox.org/git/20160729165018.GA6553%40sigill.intra.peff.net/

Actually it looks like Peff is slightly against using stderr.

JK> Git-LFS sends to stderr because there's no other option. I wonder if it
JK> would be nicer to make it Git's responsibility to talk to the user,
JK> because then it could respect things like "--quiet". I guess error
JK> messages are generally printed regardless of verbosity, though, so
JK> printing them unconditionally is OK.

I think it should be O.K., and it makes writing filter drivers
simpler if we don't have multiplex channels.

>> NOTE: there is a bit of mixed and possibly confusing notation, that
>> is 0000 is flush packet, not packet with 0000 as content.  Perhaps
>> write pkt-line in full?
> 
> I am not sure I understand what you mean (maybe it's too late for me...).
> Can you try to rephrase or give an example?

Compare

  packet:          git< 0000

with

  packet:          git< success\n

The former as pkt-line is

  git< 0000

the latter is

  git< 000csuccess\n
       ^^^^
           \-- packet header

-- 
Jakub Narębski

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html