Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

Lars Schneider <larsxschneider@xxxxxxxxx> · Tue, 18 Apr 2017 18:14:36 +0200

> On 12. Apr 2017, at 19:46, Taylor Blau <ttaylorr@xxxxxxxxxx> wrote:
> 
> I think this is a great approach and one that I'd be happy to implement in LFS.
> The additional capability isn't too complex, so I think other similar filters to
> LFS shouldn't have a hard time implementing it either.
> 
> I left a few comments, mostly expressing approval to the documentation changes.
> I'll leave the C review to someone more expert than me.
> 
> +1 from me on the protocol changes.

Thanks!

>> +Delay
>> +^^^^^
>> +
>> +If the filter supports the "delay" capability, then Git can send the
>> +flag "delay-able" after the filter command and pathname.
> 
> Nit: I think either way is fine, but `can_delay` will save us 1 byte per each
> new checkout entry.

1 byte is no convincing argument to me but since you are a native speaker I trust your "can-delay" suggestion. I prefer dashes over underscores, though, for consistency with the rest of the protocol.

>> +"delay-id", a number that identifies the blob, and a flush packet. The
>> +filter acknowledges this number with a "success" status and a flush
>> +packet.
> 
> I mentioned this in another thread, but I'd prefer, if possible, that we use the
> pathname as a unique identifier for referring back to a particular checkout
> entry. I think adding an additional identifier adds unnecessary complication to
> the protocol and introduces a forced mapping on the filter side from id to
> path.

I agree! I answered in the other thread. Let's keep the discussion there.

> Both Git and the filter are going to have to keep these paths in memory
> somewhere, be that in-process, or on disk. That being said, I can see potential
> troubles with a large number of long paths that exceed the memory available to
> Git or the filter when stored in a hashmap/set.
> 
> On Git's side, I think trading that for some CPU time might make sense. If Git
> were to SHA1 each path and store that in a hashmap, it would consume more CPU
> time, but less memory to store each path. Git and the filter could then exchange
> path names, and Git would simply SHA1 the pathname each time it needed to refer
> back to memory associated with that entry in a hashmap.

I would be surprised if this would be necessary. If we filter delay 50,000 files (= a lot!) with a path length of 1000 characters (= very long!) then we would use 50MB plus some hashmap data structures. Modern machines should have enough RAM I would think...

Thanks,
Lars