Re: [PATCH] RFC: git lazy clone proof-of-concept

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jakub,

On Friday 08 February 2008 20:00, Jakub Narebski wrote:

> It was not implemented because it was thought to be hard; git assumes
> in many places that if it has an object, it has all objects referenced
> by it.
>
> But it is very nice of you to [try to] implement 'lazy clone'/'remote
> alternates'.
>
> Could you provide some benchmarks (time, network throughtput, latency)
> for your implementation?

Unfortunately not yet :-(  The only data I have that clone done on 
git://localhost/ooo.git took 10 minutes without the lazy clone, and 7.5 
minutes with it - and then I sent the patch for review here ;-)  The deadline 
for our SVN vs. git comparison for OOo is the next Friday, so I'll definitely 
have some better data by then.

> Both Mozilla import, and GCC import were packed below 0.5 GB. Warning:
> you would need machine with large amount of memory to repack it
> tightly in sensible time!

As I answered elsewhere, unfortunately it goes out of memory even on 8G 
machine (x86-64), so...  But still trying.

> > Shallow clone is not a possibility - we don't get patches through
> > mailing lists, so we need the pull/push, and also thanks to the OOo
> > development cycle, we have too many living heads which causes the
> > shallow clone to download about 1.5G even with --depth 1.
>
> Wouldn't be easier to try to fix shallow clone implementation to allow
> for pushing from shallow to full clone (fetching from full to shallow
> is implemented), and perhaps also push/pull between two shallow
> clones?

I tried to look into it a bit, but unfortunately did not see a clear way how 
to do it transparently for the user - say you pull a branch that is based off 
a commit you do not have.  But of course, I could have missed something ;-)

> As to many living heads: first, you don't need to fetch all
> heads. Currently git-clone has no option to select subset of heads to
> clone, but you can always use git-init + hand configuration +
> git-remote and git-fetch for actual fetching.

Right, might be interesting as well.  But still the missing push/pull is 
problematic for us [or at least I see it as a problem ;-)].

> By the way, did you try to split OpenOffice.org repository at the
> components boundary into submodules (subprojects)? This would also
> limit amount of needed download, as you don't neeed to download and
> checkout all subprojects.

Yes, and got to much nicer repositories by that ;-) - by only moving some 
binary stuff out of the CVS to a separate tree.  The problem is that the deal 
is to compare the same stuff in SVN and git - so no choice for me in fact.

> The problem of course is _how_ to split repository into
> submodules. Submodules should be enough self contained so the
> whole-tree commit is alsays (or almost always) only about submodule.

I hope it will be doable _if_ the git wins & will be chosen for OOo.

> > Lazy clone sounded like the right idea to me.  With this
> > proof-of-concept implementation, just about 550M from the 2.5G is
> > downloaded, which is still about twice as much in comparison with
> > downloading a tarball, but bearable.
>
> Do you have any numbers for OOo repository like number of revisions,
> depth of DAG of commits (maximum number of revisions in one line of
> commits), number of files, size of checkout, average size of file,
> etc.?

I'll try to provide the data ASAP.

Regards,
Jan
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux