Re: [PATCH 00/16] Subtree clone proof of concept

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>:
> Something to play with so we can evaluate which is the best strategy
> for non-full clone (or whatever you call it).

Very nice, it's awesome you're working on this.  I'm of the same
opinion that Shawn stated earlier, namely that I don't like the route
of rewriting commits on the fly like this (more on that later), but
it's really cool to see some ideas being tried and pushed to their
limits.

> The idea is the same: pack only enough to access a subtree, rewrite
> commits at client side, rewrite again when pushing. However I put
> git-replace into the mix, so at least commit SHA-1 looks as same as from
> upstream. git-subtree is not needed (although it's still an option)
>
> With this, I can clone Documentaion/ from git.git, update and push. I

I tried it out, but I seem to be doing something wrong.  I applied
your patches to current master, and tried the following -- am I doing
something wrong or omitting any important steps?

$ git --version
git version 1.7.2.1.22.g236df

$ git clone file://$(pwd)/git fullclone
Cloning into fullclone...
warning: templates not found /home/newren/share/git-core/templates
remote: Counting objects: 96220, done.
remote: Compressing objects: 100% (24925/24925), done.
remote: Total 96220 (delta 70575), reused 95687 (delta 70236)
Receiving objects: 100% (96220/96220), 18.45 MiB | 11.43 MiB/s, done.
Resolving deltas: 100% (70575/70575), done.
fatal: unable to read tree 49374ea4780c0db6db7c604697194bc9b148f3dc

$ git clone --subtree=Documentation/ file://$(pwd)/git docclone
Cloning into docclone...
warning: templates not found /home/newren/share/git-core/templates
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed


> haven't tested it further. Space consumption is 24MB (58MB for full
> repo).  Not really impressive, but if one truely cares about disk
> space, he/she should also use shallow clone.

58 MB for full repo?  What are you counting?  For me, I get 25M:

$ git clone git://git.kernel.org/pub/scm/git/git.git
$ ls -lh git/.git/objects/pack/*.pack
-r--r--r--. 1 newren newren 25M 2010-08-01 18:05
git/.git/objects/pack/pack-d41d36a8f0f34d5bc647b3c83c5d6b64fbc059c8.pack

Are you counting the full checkout too or something?  If so, that
varies very wildly between systems, making it hard to compare numbers.
 (For me, 'du -hs git/' returns 44 MB.)  I'd like to be able to
duplicate your numbers and investigate further.  It seems to me that
we ought to be able to get that lower.

> Performance is impacted, due to bulk commit replacement. There is a
> split second delay for every command. It's the price of replacing 24k
> commits every time. I think the delay could be improved a little bit
> (caching or mmap..)
>
> Rewriting commits at clone takes time too. Doing individual object
> writing takes lots of space and time. I put all new objects directly
> to a pack now. Rewriting time now becomes quite acceptable (a few
> seconds). Although deep subtree/repo may take longer. Rewriting on
> demand can be considered in such cases.
>
> Repo-care commands like fsck, repack, gc are left out for now.
>
> Finally, it's more of a hack just to see how far I can go. It will
> break things.

I think it's a pretty nifty hack.  It's fun to see.  :-)  However, I
do have a number of reservations about the general strategy:  As
mentioned earlier, I'm not sure I like the on-the-fly commit
rewriting, as mentioned by Shawn in your previous
subtree-for-upload-pack patch series.  You did take care of the
"referring to commit-sha1" issue he brought up by using the replace
mechanism, but I'm still not sure I'm comfortable with it.  The
performance implications also worry me (a lot of the reason for sparse
clones was to improve performance, at least from my view), as does the
fact that it only works on exactly one subtree (at least your current
implementation; most of my usecases involve multiple sibling
subdirectories that I'd like to get), as does the fact that it
(currently) only handles trees and does not handle files (ruling out
the translator usecase I'd like to see covered, e.g. cloning just
po/de.po and its history without all sibling files).

Also, I couldn't tell if your implementation downloaded full commit
information for commits that didn't touch any of the files under the
relevant subtree.  I think it does, but couldn't tell for sure (I
wanted to use a clone and dig into it to find out, but ran into the
problems I mentioned above).  If so, that also worries me a bit -- see
http://article.gmane.org/gmane.comp.version-control.git/152343.

Your implementation also suffers from the same limitations as current
shallow clones.  For example, you can't clone or fetch from a subtree
clone.  That limits collaboration between people needing to work on
the same subset of history, and was a limitation I was hoping to see
fixed, rather than propagated to more features.

I hope I'm not coming across as too critical.  I'm really excited to
see work in this area.  Hopefully I can get more time to pursue my
route a bit further; currently I don't have too much more than a
detailed idea write-up (heavily revised since the previous thread --
thanks for the feedback, btw).  Or maybe you just know how to address
all my concerns and you beat me to the punch.  That'd be awesome.


Elijah
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]