On Sun, Sep 16, 2018 at 03:05:48PM -0700, Jonathan Nieder wrote: > Hi, > > On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote: > > Taylor Blau wrote: > > >> Right, though this still subjects the remote copy to all of the > >> difficulty of packing large objects (though Christian's work to support > >> other object database implementations would go a long way to help this). > > > > Ah, interesting -- I didn't realize this step was part of the > > bottleneck. I presumed git didn't do much more than perhaps gzip'ing > > binary files when it packed them up. Or do you mean the growing cost > > of storing the objects locally as you work? Perhaps that could be > > solved by allowing the client more control (ie. delete the oldest > > blobs that exist on the server). > > John, I believe you are correct. Taylor, can you elaborate about what > packing overhead you are referring to? Jonathan, you are right. I was also referring about the increased time that Git would spend trying to find good packfile chains with larger, non-textual objects. I haven't done any hard benchmarking work on this, so it may be a moot point. > In other words, using a rolling hash to decide where to split a blob > and use a tree-like structure so that (1) common portions between > files can deduplicated and (2) portions can be hashed in parallel. I think that this is worth discussing further. Certainly, it would go a good bit of the way to addressing the point that I responded to earlier in this message. Thanks, Taylor