> On 12. Apr 2017, at 06:37, Torsten Bögershausen <tboegi@xxxxxx> wrote: > > On 2017-04-11 21:50, Lars Schneider wrote: > > [] >>> packet: git> command=smudge >>> packet: git> pathname=path/testfile.dat >>> packet: git> delay-id=1 >>> packet: git> 0000 >>> packet: git> CONTENT >>> packet: git> 0000 >>> packet: git< status=delayed # this means: Git, please feed more >>> packet: git> 0000 >> Actually, this is how I implemented it first. >> >> However, I didn't like that because we associate a >> pathname with a delay-id. If the filter does not >> delay the item then we associate a different >> pathname with the same delay-id in the next request. >> Therefore I think it is better to present the delay-id >> *only* to the filter if the item is actually delayed. >> >> I would be surprised if the extra round trip does impact >> the performance in any meaningful way. >> > > 2 spontanous remarks: > > - Git can simply use a counter which is incremented by each blob > that is send to the filter. > Regardless what the filter answers (delayed or not), simply increment a > counter. (or is this too simple and I miss something?) I am not sure I understand what you mean. Do you want to say that the filter just starts to delay items with an increasing counter by itself (first item = 0, second item = 1, ...). That could work but I would prefer a more explicitly defined solution. The self counting implicit way could easily get confused/mixed up I think. > - I was thinking that the filter code is written as either "never delay" or > "always delay". > "Never delay" is the existing code. > What is your idea, when should a filter respond with delayed ? > My thinking was "always", silently assuming the more than one core can be > used, so that blobs can be filtered in parallel. In case of Git LFS I expect the filter to answer with "delay" if a network call is required to fulfill the (smudge-) filter request. >> We could do this but I think this would only complicate >> the protocol. I expect the filter to spool results to the >> disk or something. > Spooling things to disk was not part of my picture, to be honest. > This means additional execution time when a SSD is used, the chips > are more worn out... > There may be situations, where this is OK for some users (but not for others) > How can we prevent Git from (over-) flooding the filter? I don't think this is Git's responsibility. If the filter can't handle the data then I expect the filter to *not* ask Git for a delay. > The protocol response from the filter would be just "delayed", and the filter > would block Git, right ? > But, in any case, it would still be nice if Git would collect converted blobs > from the filter, to free resource here. > This is more like the "streaming model", but on a higher level: > Send 4 blobs to the filter, collect the ready one, send the 5th blob to > the filter, collect the ready one, send the 6th blob to the filter, collect > ready one.... That sounds nice, but I don't think that it is necessary in this (first) iteration. I think it is acceptable for the filter to spool data to disk. > (Back to the roots) > Which criteria do you have in mind: When should a filter process the blob > and return it immediately, and when would it respond "delayed" ? See above: it's up to the filter. In case of Git LFS: delay if a network call is required. Thanks, Lars