Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

Torsten Bögershausen <tboegi@xxxxxx> · Mon, 10 Apr 2017 22:54:19 +0200

On 2017-04-09 21:11, Lars Schneider wrote:
[]
> +------------------------
> +packet:          git> command=smudge
> +packet:          git> pathname=path/testfile.dat
> +packet:          git> delay-able=1
> +packet:          git> 0000
> +packet:          git> CONTENT
> +packet:          git> 0000
> +packet:          git< status=delayed
> +packet:          git< 0000
> +packet:          git> delay-id=1
> +packet:          git> 0000
> +packet:          git< status=success
> +packet:          git< 0000

(not sure if this was mentioned before)
If a filter uses the delayed feature, I would read it as
a response from the filter in the style:
"Hallo Git, I need some time to process it, but as I have
CPU capacity available, please send another blob,
so I can chew them in parallel."

Can we save one round trip ?

packet:          git> command=smudge
packet:          git> pathname=path/testfile.dat
packet:          git> delay-id=1
packet:          git> 0000
packet:          git> CONTENT
packet:          git> 0000
packet:          git< status=delayed # this means: Git, please feed more
packet:          git> 0000

# Git feeds the next blob.
# This may be repeated some rounds.
# (We may want to restrict the number of rounds for Git, see below)
# After these some rounds, the filter needs to signal:
# no more fresh blobs please, collect some data and I can free memory
# and after that I am able to get a fresh blob.
packet:          git> command=smudge
packet:          git> pathname=path/testfile.dat
packet:          git> delay-id=2
packet:          git> 0000
packet:          git> CONTENT
packet:          git> 0000
packet:          git< status=pleaseWait
packet:          git> 0000

# Now Git needs to ask for ready blobs.

> +------------------------
> +
> +If the filter supports the "delay" capability then it must support the
> +"list_available_blobs" command. If Git sends this command, then the
> +filter is expected to return a list of "delay_ids" of blobs that are
> +available. The list must be terminated with a flush packet followed
> +by a "success" status that is also terminated with a flush packet. If
> +no blobs for the delayed paths are available, yet, then the filter is
> +expected to block the response until at least one blob becomes
> +available. The filter can tell Git that it has no more delayed blobs
> +by sending an empty list.
> +------------------------
> +packet:          git> command=list_available_blobs
> +packet:          git> 0000
> +packet:          git< 7

Is the "7" the same as the "delay-id=1" from above?
It may be easier to understand, even if it costs some bytes, to answer instead
packet:          git< delay-id=1
(And at this point, may I suggest to change "delay-id" into "request-id=1" ?

> +packet:          git< 13

Same question here: is this the delay-id ?

> +packet:          git< 0000
> +packet:          git< status=success
> +packet:          git< 0000
> +------------------------
> +
> +After Git received the "delay_ids", it will request the corresponding
> +blobs again. These requests contain a "delay-id" and an empty content
> +section. The filter is expected to respond with the smudged content
> +in the usual way as explained above.
> +------------------------
> +packet:          git> command=smudge
> +packet:          git> pathname=test-delay10.a
> +packet:          git> delay-id=0

Minor question: Where does the "id=0" come from ?

> +packet:          git> 0000
> +packet:          git> 0000  # empty content!
> +packet:          git< status=success
> +packet:          git< 0000
> +packet:          git< SMUDGED_CONTENT
> +packet:          git< 0000
> +packet:          git< 0000

OK, good.

The quest is: what happens next ?

2 things, kind of in parallel, but we need to prioritize and serialize:
- Send the next blob
- Fetch ready blobs
- And of course: ask for more ready blobs.
(it looks as if Peff and Jakub had useful comments already,
  so I can stop here?)

In general, Git should not have a unlimited number of blobs outstanding,
as memory constraints may apply.
There may be a config variable for the number of outstanding blobs,
(similar to the window size in other protocols) and a variable
for the number of "send bytes in outstanding blobs"
(similar to window size (again!) in e.g TCP)

The number of outstanding blobs is may be less important, and it is more
important to monitor the number of bytes we keep in memory in some way.

Something like "we set a limit to 500K of out standng data", once we are
above the limit, don't send any new blobs.

(No, I didn't look at the code at all, this protocol discussion
is much more interesting)