Jan Holesovsky <kendy@xxxxxxx> writes: > This is my attempt to implement the 'lazy clone' I've read about a > bit in the git mailing list archive, but did not see implemented > anywhere - the clone that fetches a minimal amount of data with the > possibility to download the rest later (transparently!) when > necessary. It was not implemented because it was thought to be hard; git assumes in many places that if it has an object, it has all objects referenced by it. But it is very nice of you to [try to] implement 'lazy clone'/'remote alternates'. Could you provide some benchmarks (time, network throughtput, latency) for your implementation? > Currently we are evaluating the usage of git for OpenOffice.org as > one of the candidates (SVN is the other one), see > > http://wiki.services.openoffice.org/wiki/SCM_Migration > > I've provided a git import of OOo with the entire history; the > problem is that the pack has 2.5G, so it's not too convenient to > download for casual developers that just want to try it. One of the reasons why 'lazy clone' was not implemented was the fact that by using large enough window, and larger than default delta length you can repack "archive pack" (and keep it from trying to repack using .keep files, see git-config(1)) much tighter than with default (time and CPU conserving) options, and much, much tighter than pack which is result of fast-import driven import. Both Mozilla import, and GCC import were packed below 0.5 GB. Warning: you would need machine with large amount of memory to repack it tightly in sensible time! > Shallow clone is not a possibility - we don't get patches through > mailing lists, so we need the pull/push, and also thanks to the OOo > development cycle, we have too many living heads which causes the > shallow clone to download about 1.5G even with --depth 1. Wouldn't be easier to try to fix shallow clone implementation to allow for pushing from shallow to full clone (fetching from full to shallow is implemented), and perhaps also push/pull between two shallow clones? As to many living heads: first, you don't need to fetch all heads. Currently git-clone has no option to select subset of heads to clone, but you can always use git-init + hand configuration + git-remote and git-fetch for actual fetching. By the way, did you try to split OpenOffice.org repository at the components boundary into submodules (subprojects)? This would also limit amount of needed download, as you don't neeed to download and checkout all subprojects. The problem of course is _how_ to split repository into submodules. Submodules should be enough self contained so the whole-tree commit is alsays (or almost always) only about submodule. > Lazy clone sounded like the right idea to me. With this > proof-of-concept implementation, just about 550M from the 2.5G is > downloaded, which is still about twice as much in comparison with > downloading a tarball, but bearable. Do you have any numbers for OOo repository like number of revisions, depth of DAG of commits (maximum number of revisions in one line of commits), number of files, size of checkout, average size of file, etc.? -- Jakub Narebski Poland ShadeHawk on #git - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html