2010/8/2 Elijah Newren <newren@xxxxxxxxx>: >> The idea is the same: pack only enough to access a subtree, rewrite >> commits at client side, rewrite again when pushing. However I put >> git-replace into the mix, so at least commit SHA-1 looks as same as from >> upstream. git-subtree is not needed (although it's still an option) >> >> With this, I can clone Documentaion/ from git.git, update and push. I > > I tried it out, but I seem to be doing something wrong. I applied > your patches to current master, and tried the following -- am I doing > something wrong or omitting any important steps? > > $ git --version > git version 1.7.2.1.22.g236df > > $ git clone file://$(pwd)/git fullclone > Cloning into fullclone... > warning: templates not found /home/newren/share/git-core/templates > remote: Counting objects: 96220, done. > remote: Compressing objects: 100% (24925/24925), done. > remote: Total 96220 (delta 70575), reused 95687 (delta 70236) > Receiving objects: 100% (96220/96220), 18.45 MiB | 11.43 MiB/s, done. > Resolving deltas: 100% (70575/70575), done. > fatal: unable to read tree 49374ea4780c0db6db7c604697194bc9b148f3dc This one looks like the unintialized case you pointed out in process_tree(). No I did not try full clone on my patched git :-P > $ git clone --subtree=Documentation/ file://$(pwd)/git docclone > Cloning into docclone... > warning: templates not found /home/newren/share/git-core/templates > fatal: The remote end hung up unexpectedly > fatal: early EOF > fatal: index-pack failed Not sure. Does file:// use receive-pack/upload-pack? I tested it over local ssh. Will try again soon. >> haven't tested it further. Space consumption is 24MB (58MB for full >> repo). Not really impressive, but if one truely cares about disk >> space, he/she should also use shallow clone. > > 58 MB for full repo? What are you counting? For me, I get 25M: > > $ git clone git://git.kernel.org/pub/scm/git/git.git > $ ls -lh git/.git/objects/pack/*.pack > -r--r--r--. 1 newren newren 25M 2010-08-01 18:05 > git/.git/objects/pack/pack-d41d36a8f0f34d5bc647b3c83c5d6b64fbc059c8.pack > > Are you counting the full checkout too or something? If so, that > varies very wildly between systems, making it hard to compare numbers. > (For me, 'du -hs git/' returns 44 MB.) I'd like to be able to > duplicate your numbers and investigate further. It seems to me that > we ought to be able to get that lower. It's my git.git, probably has more topic branches plus junk stuff. If you are only interested in numbers, playing with git pack-objects is enough. You need changes in list-objects.c and builtin/pack-objects.c, then you can git pack-objects --stdout --subtree=foo/ > temp.pack and examine it with verify-pack. >> Finally, it's more of a hack just to see how far I can go. It will >> break things. > > I think it's a pretty nifty hack. It's fun to see. :-) However, I > do have a number of reservations about the general strategy: As > mentioned earlier, I'm not sure I like the on-the-fly commit > rewriting, as mentioned by Shawn in your previous > subtree-for-upload-pack patch series. You did take care of the > "referring to commit-sha1" issue he brought up by using the replace > mechanism, but I'm still not sure I'm comfortable with it. The > performance implications also worry me (a lot of the reason for sparse > clones was to improve performance, at least from my view), as does the > fact that it only works on exactly one subtree (at least your current > implementation; most of my usecases involve multiple sibling > subdirectories that I'd like to get), as does the fact that it > (currently) only handles trees and does not handle files (ruling out > the translator usecase I'd like to see covered, e.g. cloning just > po/de.po and its history without all sibling files). And it's also fun to try. I'd like to try it on larger repos but I have quite limited network until October. > Also, I couldn't tell if your implementation downloaded full commit > information for commits that didn't touch any of the files under the > relevant subtree. I think it does, but couldn't tell for sure (I > wanted to use a clone and dig into it to find out, but ran into the > problems I mentioned above). If so, that also worries me a bit -- see > http://article.gmane.org/gmane.comp.version-control.git/152343. It does. Yes, that's also something to think of. > Your implementation also suffers from the same limitations as current > shallow clones. For example, you can't clone or fetch from a subtree > clone. That limits collaboration between people needing to work on > the same subset of history, and was a limitation I was hoping to see > fixed, rather than propagated to more features. I agree. Being able to fetch from an incomplete repo is very nice. Though I admit I don't know how to do it. I think sparse clone would suffer the same, wouldn't it? > I hope I'm not coming across as too critical. I'm really excited to > see work in this area. Hopefully I can get more time to pursue my > route a bit further; currently I don't have too much more than a > detailed idea write-up (heavily revised since the previous thread -- > thanks for the feedback, btw). Or maybe you just know how to address > all my concerns and you beat me to the punch. That'd be awesome. Look forward to see sparse clone realized. Although I think that would be painful :-) -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html