Hi, 2010/7/31 Nguyễn Thái Ngọc Duy <pclouds@xxxxxxxxx>: > Something to play with so we can evaluate which is the best strategy > for non-full clone (or whatever you call it). Very nice, it's awesome you're working on this. I'm of the same opinion that Shawn stated earlier, namely that I don't like the route of rewriting commits on the fly like this (more on that later), but it's really cool to see some ideas being tried and pushed to their limits. > The idea is the same: pack only enough to access a subtree, rewrite > commits at client side, rewrite again when pushing. However I put > git-replace into the mix, so at least commit SHA-1 looks as same as from > upstream. git-subtree is not needed (although it's still an option) > > With this, I can clone Documentaion/ from git.git, update and push. I I tried it out, but I seem to be doing something wrong. I applied your patches to current master, and tried the following -- am I doing something wrong or omitting any important steps? $ git --version git version 1.7.2.1.22.g236df $ git clone file://$(pwd)/git fullclone Cloning into fullclone... warning: templates not found /home/newren/share/git-core/templates remote: Counting objects: 96220, done. remote: Compressing objects: 100% (24925/24925), done. remote: Total 96220 (delta 70575), reused 95687 (delta 70236) Receiving objects: 100% (96220/96220), 18.45 MiB | 11.43 MiB/s, done. Resolving deltas: 100% (70575/70575), done. fatal: unable to read tree 49374ea4780c0db6db7c604697194bc9b148f3dc $ git clone --subtree=Documentation/ file://$(pwd)/git docclone Cloning into docclone... warning: templates not found /home/newren/share/git-core/templates fatal: The remote end hung up unexpectedly fatal: early EOF fatal: index-pack failed > haven't tested it further. Space consumption is 24MB (58MB for full > repo). Not really impressive, but if one truely cares about disk > space, he/she should also use shallow clone. 58 MB for full repo? What are you counting? For me, I get 25M: $ git clone git://git.kernel.org/pub/scm/git/git.git $ ls -lh git/.git/objects/pack/*.pack -r--r--r--. 1 newren newren 25M 2010-08-01 18:05 git/.git/objects/pack/pack-d41d36a8f0f34d5bc647b3c83c5d6b64fbc059c8.pack Are you counting the full checkout too or something? If so, that varies very wildly between systems, making it hard to compare numbers. (For me, 'du -hs git/' returns 44 MB.) I'd like to be able to duplicate your numbers and investigate further. It seems to me that we ought to be able to get that lower. > Performance is impacted, due to bulk commit replacement. There is a > split second delay for every command. It's the price of replacing 24k > commits every time. I think the delay could be improved a little bit > (caching or mmap..) > > Rewriting commits at clone takes time too. Doing individual object > writing takes lots of space and time. I put all new objects directly > to a pack now. Rewriting time now becomes quite acceptable (a few > seconds). Although deep subtree/repo may take longer. Rewriting on > demand can be considered in such cases. > > Repo-care commands like fsck, repack, gc are left out for now. > > Finally, it's more of a hack just to see how far I can go. It will > break things. I think it's a pretty nifty hack. It's fun to see. :-) However, I do have a number of reservations about the general strategy: As mentioned earlier, I'm not sure I like the on-the-fly commit rewriting, as mentioned by Shawn in your previous subtree-for-upload-pack patch series. You did take care of the "referring to commit-sha1" issue he brought up by using the replace mechanism, but I'm still not sure I'm comfortable with it. The performance implications also worry me (a lot of the reason for sparse clones was to improve performance, at least from my view), as does the fact that it only works on exactly one subtree (at least your current implementation; most of my usecases involve multiple sibling subdirectories that I'd like to get), as does the fact that it (currently) only handles trees and does not handle files (ruling out the translator usecase I'd like to see covered, e.g. cloning just po/de.po and its history without all sibling files). Also, I couldn't tell if your implementation downloaded full commit information for commits that didn't touch any of the files under the relevant subtree. I think it does, but couldn't tell for sure (I wanted to use a clone and dig into it to find out, but ran into the problems I mentioned above). If so, that also worries me a bit -- see http://article.gmane.org/gmane.comp.version-control.git/152343. Your implementation also suffers from the same limitations as current shallow clones. For example, you can't clone or fetch from a subtree clone. That limits collaboration between people needing to work on the same subset of history, and was a limitation I was hoping to see fixed, rather than propagated to more features. I hope I'm not coming across as too critical. I'm really excited to see work in this area. Hopefully I can get more time to pursue my route a bit further; currently I don't have too much more than a detailed idea write-up (heavily revised since the previous thread -- thanks for the feedback, btw). Or maybe you just know how to address all my concerns and you beat me to the punch. That'd be awesome. Elijah -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html