On Wed, 13 Aug 2008, Shawn O. Pearce wrote: > Nicolas Pitre <nico@xxxxxxx> wrote: > > Well, we are talking about 50MB which is not that bad. > > I think we're closer to 100MB here due to the extra overheads > I just alluded to above, and which weren't in your 104 byte > per object figure. Sure. That should still be workable on a machine with 256MB of RAM. > > However there is a point where we should be realistic and just admit > > that you need a sufficiently big machine if you have huge repositories > > to deal with. Git should be fine serving pull requests with relatively > > little memory usage, but anything else such as the initial repack simply > > require enough RAM to be effective. > > Yea. But it would also be nice to be able to just concat packs > together. Especially if the repository in question is an open source > one and everything published is already known to be in the wild, > as say it is also available over dumb HTTP. Yea, I know people > like the 'security feature' of the packer not including objects > which aren't reachable. It is not only that, even if it is a point I consider important. If you end up with 10 packs, it is likely that a base object in each of those packs could simply be a delta against a single common base object, and therefore the amount of data to transfer might be up to 10 times higher than necessary. > But how many times has Linus published something to his linux-2.6 > tree that he didn't mean to publish and had to rewind? I think > that may be "never". Yet how many times per day does his tree get > cloned from scratch? That's not a good argument. Linus is a very disciplined git users, probably more than average. We should not use that example to paper over technical issues. > This is also true for many internal corporate repositories. > Users probably have full read access to the object database anyway, > and maybe even have direct write access to it. Doing the object > enumeration there is pointless as a security measure. It is good for network bandwidth efficiency as I mentioned. > I'm too busy to write a pack concat implementation proposal, so > I'll just shutup now. But it wouldn't be hard if someone wanted > to improve at least the initial clone serving case. A much better solution would consist of finding just _why_ object enumeration is so slow. This is indeed my biggest grip with git performance at the moment. |nico@xanadu:linux-2.6> time git rev-list --objects --all > /dev/null | |real 0m21.742s |user 0m21.379s |sys 0m0.360s That's way too long for 1030198 objects (roughly 48k objects/sec). And it gets even worse with the gcc repository: |nico@xanadu:gcc> time git rev-list --objects --all > /dev/null | |real 1m51.591s |user 1m50.757s |sys 0m0.810s That's for 1267993 objects, or about 11400 objects/sec. Clearly something is not scaling here. Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html