Hi, On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote: > Taylor Blau wrote: >> Right, though this still subjects the remote copy to all of the >> difficulty of packing large objects (though Christian's work to support >> other object database implementations would go a long way to help this). > > Ah, interesting -- I didn't realize this step was part of the > bottleneck. I presumed git didn't do much more than perhaps gzip'ing > binary files when it packed them up. Or do you mean the growing cost > of storing the objects locally as you work? Perhaps that could be > solved by allowing the client more control (ie. delete the oldest > blobs that exist on the server). John, I believe you are correct. Taylor, can you elaborate about what packing overhead you are referring to? One thing I would like to see in the long run to help Git cope with very large files is adding something similar to bup's "bupsplit" to the packfile format (or even better, to the actual object format, so that it affects object names). In other words, using a rolling hash to decide where to split a blob and use a tree-like structure so that (1) common portions between files can deduplicated and (2) portions can be hashed in parallel. I haven't heard of these things being the bottleneck for anyone in practice today, though. Thanks, Jonathan