On Mon, Apr 28, 2014 at 09:43:10AM -0700, Junio C Hamano wrote: > Yes, I'd love to see something along that line in the longer term, > showing all the objects as just regular objects under the hood, with > implementation details hidden in the object layer (just like there > is no distinction between packed and loose objects from the point of > view of read_sha1_file() users), as a real solution to address > issues in larger trees. > > Also see http://thread.gmane.org/gmane.comp.version-control.git/241940 > where Shawn had an interesting experiment. Yeah, I think it's pretty clear that a naive high-latency object store is unusably slow. You mentioned in that thread trying to do pre-fetching based on commits/trees, and I recall that Shawn's Cassandra experiments did that (and maybe the BigTable-backed Google Code does, too?). There's also a question of deltas. You don't want to get trees or text blobs individually without deltas, because your total size ends up way bigger. But I think for large object support, we can side-step the issue. The objects will all be blobs (so they cannot refer to anything else), they will typically not delta well, and the connection setup and latency will be dwarfed by actual transfer time. My plan was to have all clones fetch all commits and trees (and small blobs, too), and then download and cache the large blobs as-needed. That doesn't help with repositories where the actual commit history or tree size is a problem. But we already have shallow clones to help with the former. And for the latter, I think we would want a narrow clone that behaves differently than what I described above. You'd probably want a specific "widen" operation that would fetch all of the objects for the newly-widened part of the tree in one go (including deltas), and you wouldn't want it to happen on an as-needed basis. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html