> On 10 Apr 2017, at 22:54, Torsten Bögershausen <tboegi@xxxxxx> wrote: > > On 2017-04-09 21:11, Lars Schneider wrote: > [] >> +------------------------ >> +packet: git> command=smudge >> +packet: git> pathname=path/testfile.dat >> +packet: git> delay-able=1 >> +packet: git> 0000 >> +packet: git> CONTENT >> +packet: git> 0000 >> +packet: git< status=delayed >> +packet: git< 0000 >> +packet: git> delay-id=1 >> +packet: git> 0000 >> +packet: git< status=success >> +packet: git< 0000 > > (not sure if this was mentioned before) > If a filter uses the delayed feature, I would read it as > a response from the filter in the style: > "Hallo Git, I need some time to process it, but as I have > CPU capacity available, please send another blob, > so I can chew them in parallel." > > Can we save one round trip ? > > packet: git> command=smudge > packet: git> pathname=path/testfile.dat > packet: git> delay-id=1 > packet: git> 0000 > packet: git> CONTENT > packet: git> 0000 > packet: git< status=delayed # this means: Git, please feed more > packet: git> 0000 Actually, this is how I implemented it first. However, I didn't like that because we associate a pathname with a delay-id. If the filter does not delay the item then we associate a different pathname with the same delay-id in the next request. Therefore I think it is better to present the delay-id *only* to the filter if the item is actually delayed. I would be surprised if the extra round trip does impact the performance in any meaningful way. > # Git feeds the next blob. > # This may be repeated some rounds. > # (We may want to restrict the number of rounds for Git, see below) > # After these some rounds, the filter needs to signal: > # no more fresh blobs please, collect some data and I can free memory > # and after that I am able to get a fresh blob. > packet: git> command=smudge > packet: git> pathname=path/testfile.dat > packet: git> delay-id=2 > packet: git> 0000 > packet: git> CONTENT > packet: git> 0000 > packet: git< status=pleaseWait > packet: git> 0000 > > # Now Git needs to ask for ready blobs. We could do this but I think this would only complicate the protocol. I expect the filter to spool results to the disk or something. >> +------------------------ >> + >> +If the filter supports the "delay" capability then it must support the >> +"list_available_blobs" command. If Git sends this command, then the >> +filter is expected to return a list of "delay_ids" of blobs that are >> +available. The list must be terminated with a flush packet followed >> +by a "success" status that is also terminated with a flush packet. If >> +no blobs for the delayed paths are available, yet, then the filter is >> +expected to block the response until at least one blob becomes >> +available. The filter can tell Git that it has no more delayed blobs >> +by sending an empty list. >> +------------------------ >> +packet: git> command=list_available_blobs >> +packet: git> 0000 >> +packet: git< 7 > > Is the "7" the same as the "delay-id=1" from above? Yes! Sorry, I will make this more clear in the next round. > It may be easier to understand, even if it costs some bytes, to answer instead > packet: git< delay-id=1 Agreed! > (And at this point, may I suggest to change "delay-id" into "request-id=1" ? If there is no objection by another reviewer then I am happy to change it. >> +packet: git< 13 > > Same question here: is this the delay-id ? Yes. >> +packet: git< 0000 >> +packet: git< status=success >> +packet: git< 0000 >> +------------------------ >> + >> +After Git received the "delay_ids", it will request the corresponding >> +blobs again. These requests contain a "delay-id" and an empty content >> +section. The filter is expected to respond with the smudged content >> +in the usual way as explained above. >> +------------------------ >> +packet: git> command=smudge >> +packet: git> pathname=test-delay10.a >> +packet: git> delay-id=0 > > Minor question: Where does the "id=0" come from ? That's the delay-id (aka request-id) that Git gave to the filter on the first request (which was delayed). >> +packet: git> 0000 >> +packet: git> 0000 # empty content! >> +packet: git< status=success >> +packet: git< 0000 >> +packet: git< SMUDGED_CONTENT >> +packet: git< 0000 >> +packet: git< 0000 > > OK, good. > > The quest is: what happens next ? > > 2 things, kind of in parallel, but we need to prioritize and serialize: > - Send the next blob > - Fetch ready blobs > - And of course: ask for more ready blobs. > (it looks as if Peff and Jakub had useful comments already, > so I can stop here?) I would like to keep the mechanism as follows: 1. sends all blobs to the filter 2. fetch blobs until we are done @Taylor: Do you think that would be OK for LFS? > In general, Git should not have a unlimited number of blobs outstanding, > as memory constraints may apply. > There may be a config variable for the number of outstanding blobs, > (similar to the window size in other protocols) and a variable > for the number of "send bytes in outstanding blobs" > (similar to window size (again!) in e.g TCP) > > The number of outstanding blobs is may be less important, and it is more > important to monitor the number of bytes we keep in memory in some way. > > Something like "we set a limit to 500K of out standng data", once we are > above the limit, don't send any new blobs. I don't expect the filter to keep everything in memory. If there is no memory anymore then I expect the filter to spool to disk. This keeps the protocol simple. If this turns out to be not sufficient then we could improve that later, too. Thanks, Lars