Hi Jakub, On Friday 08 February 2008 20:00, Jakub Narebski wrote: > It was not implemented because it was thought to be hard; git assumes > in many places that if it has an object, it has all objects referenced > by it. > > But it is very nice of you to [try to] implement 'lazy clone'/'remote > alternates'. > > Could you provide some benchmarks (time, network throughtput, latency) > for your implementation? Unfortunately not yet :-( The only data I have that clone done on git://localhost/ooo.git took 10 minutes without the lazy clone, and 7.5 minutes with it - and then I sent the patch for review here ;-) The deadline for our SVN vs. git comparison for OOo is the next Friday, so I'll definitely have some better data by then. > Both Mozilla import, and GCC import were packed below 0.5 GB. Warning: > you would need machine with large amount of memory to repack it > tightly in sensible time! As I answered elsewhere, unfortunately it goes out of memory even on 8G machine (x86-64), so... But still trying. > > Shallow clone is not a possibility - we don't get patches through > > mailing lists, so we need the pull/push, and also thanks to the OOo > > development cycle, we have too many living heads which causes the > > shallow clone to download about 1.5G even with --depth 1. > > Wouldn't be easier to try to fix shallow clone implementation to allow > for pushing from shallow to full clone (fetching from full to shallow > is implemented), and perhaps also push/pull between two shallow > clones? I tried to look into it a bit, but unfortunately did not see a clear way how to do it transparently for the user - say you pull a branch that is based off a commit you do not have. But of course, I could have missed something ;-) > As to many living heads: first, you don't need to fetch all > heads. Currently git-clone has no option to select subset of heads to > clone, but you can always use git-init + hand configuration + > git-remote and git-fetch for actual fetching. Right, might be interesting as well. But still the missing push/pull is problematic for us [or at least I see it as a problem ;-)]. > By the way, did you try to split OpenOffice.org repository at the > components boundary into submodules (subprojects)? This would also > limit amount of needed download, as you don't neeed to download and > checkout all subprojects. Yes, and got to much nicer repositories by that ;-) - by only moving some binary stuff out of the CVS to a separate tree. The problem is that the deal is to compare the same stuff in SVN and git - so no choice for me in fact. > The problem of course is _how_ to split repository into > submodules. Submodules should be enough self contained so the > whole-tree commit is alsays (or almost always) only about submodule. I hope it will be doable _if_ the git wins & will be chosen for OOo. > > Lazy clone sounded like the right idea to me. With this > > proof-of-concept implementation, just about 550M from the 2.5G is > > downloaded, which is still about twice as much in comparison with > > downloading a tarball, but bearable. > > Do you have any numbers for OOo repository like number of revisions, > depth of DAG of commits (maximum number of revisions in one line of > commits), number of files, size of checkout, average size of file, > etc.? I'll try to provide the data ASAP. Regards, Jan - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html