> On 12. Apr 2017, at 19:46, Taylor Blau <ttaylorr@xxxxxxxxxx> wrote: > > I think this is a great approach and one that I'd be happy to implement in LFS. > The additional capability isn't too complex, so I think other similar filters to > LFS shouldn't have a hard time implementing it either. > > I left a few comments, mostly expressing approval to the documentation changes. > I'll leave the C review to someone more expert than me. > > +1 from me on the protocol changes. Thanks! >> +Delay >> +^^^^^ >> + >> +If the filter supports the "delay" capability, then Git can send the >> +flag "delay-able" after the filter command and pathname. > > Nit: I think either way is fine, but `can_delay` will save us 1 byte per each > new checkout entry. 1 byte is no convincing argument to me but since you are a native speaker I trust your "can-delay" suggestion. I prefer dashes over underscores, though, for consistency with the rest of the protocol. >> +"delay-id", a number that identifies the blob, and a flush packet. The >> +filter acknowledges this number with a "success" status and a flush >> +packet. > > I mentioned this in another thread, but I'd prefer, if possible, that we use the > pathname as a unique identifier for referring back to a particular checkout > entry. I think adding an additional identifier adds unnecessary complication to > the protocol and introduces a forced mapping on the filter side from id to > path. I agree! I answered in the other thread. Let's keep the discussion there. > Both Git and the filter are going to have to keep these paths in memory > somewhere, be that in-process, or on disk. That being said, I can see potential > troubles with a large number of long paths that exceed the memory available to > Git or the filter when stored in a hashmap/set. > > On Git's side, I think trading that for some CPU time might make sense. If Git > were to SHA1 each path and store that in a hashmap, it would consume more CPU > time, but less memory to store each path. Git and the filter could then exchange > path names, and Git would simply SHA1 the pathname each time it needed to refer > back to memory associated with that entry in a hashmap. I would be surprised if this would be necessary. If we filter delay 50,000 files (= a lot!) with a path length of 1000 characters (= very long!) then we would use 50MB plus some hashmap data structures. Modern machines should have enough RAM I would think... Thanks, Lars