Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

Torsten Bögershausen <tboegi@xxxxxx> · Wed, 12 Apr 2017 06:37:11 +0200

On 2017-04-11 21:50, Lars Schneider wrote:

[]
>> packet:          git> command=smudge
>> packet:          git> pathname=path/testfile.dat
>> packet:          git> delay-id=1
>> packet:          git> 0000
>> packet:          git> CONTENT
>> packet:          git> 0000
>> packet:          git< status=delayed # this means: Git, please feed more
>> packet:          git> 0000
> Actually, this is how I implemented it first.
> 
> However, I didn't like that because we associate a
> pathname with a delay-id. If the filter does not
> delay the item then we associate a different
> pathname with the same delay-id in the next request. 
> Therefore I think it is better to present the delay-id 
> *only* to the filter if the item is actually delayed.
> 
> I would be surprised if the extra round trip does impact
> the performance in any meaningful way.
> 

2 spontanous remarks:

- Git can simply use a counter which is incremented by each blob
  that is send to the filter.
  Regardless what the filter answers (delayed or not), simply increment a
  counter. (or is this too simple and I miss something?)

- I was thinking that the filter code is written as either "never delay" or
  "always delay".
  "Never delay" is the existing code.
  What is your idea, when should a filter respond with delayed ?
  My thinking was "always", silently assuming the more than one core can be
  used, so that blobs can be filtered in parallel.

>We could do this but I think this would only complicate
>the protocol. I expect the filter to spool results to the
>disk or something.
  Spooling things to disk was not part of my picture, to be honest.
  This means additional execution time when a SSD is used, the chips
  are more worn out...
  There may be situations, where this is OK for some users (but not for others)
  How can we prevent Git from (over-) flooding the filter?
  The protocol response from the filter would be just "delayed", and the filter
  would block Git, right ?
  But, in any case, it would still be nice if Git would collect converted blobs
  from the filter, to free resource here.
  This is more like the "streaming model", but on a higher level:
  Send 4 blobs to the filter, collect the ready one, send the 5th blob to
  the filter, collect the ready one, send the 6th blob to the filter, collect
  ready one....

(Back to the roots)
Which criteria do you have in mind: When should a filter process the blob
and return it immediately, and when would it respond "delayed" ?