Nicolas Pitre wrote:
On Wed, 13 Aug 2008, Shawn O. Pearce wrote:
Nicolas Pitre <nico@xxxxxxx> wrote:
Well, we are talking about 50MB which is not that bad.
I think we're closer to 100MB here due to the extra overheads
I just alluded to above, and which weren't in your 104 byte
per object figure.
Sure. That should still be workable on a machine with 256MB of RAM.
However there is a point where we should be realistic and just admit
that you need a sufficiently big machine if you have huge repositories
to deal with. Git should be fine serving pull requests with relatively
little memory usage, but anything else such as the initial repack simply
require enough RAM to be effective.
Yea. But it would also be nice to be able to just concat packs
together. Especially if the repository in question is an open source
one and everything published is already known to be in the wild,
as say it is also available over dumb HTTP. Yea, I know people
like the 'security feature' of the packer not including objects
which aren't reachable.
It is not only that, even if it is a point I consider important. If you
end up with 10 packs, it is likely that a base object in each of those
packs could simply be a delta against a single common base object, and
therefore the amount of data to transfer might be up to 10 times higher
than necessary.
[cut]
This is also true for many internal corporate repositories.
Users probably have full read access to the object database anyway,
and maybe even have direct write access to it. Doing the object
enumeration there is pointless as a security measure.
It is good for network bandwidth efficiency as I mentioned.
As a corporate git user, I can say that I'm very rarely worried
about how much data gets sent over our in-office gigabit network.
My primary concern wrt server side git is cpu- and IO-heavy
operations, as we run the entire machine in a vmware guest os
which just plain sucks at such things.
With that in mind, a config variable in /etc/gitconfig would
work wonderfully for that situation, as our central watering
hole only ever serves locally.
I'm too busy to write a pack concat implementation proposal, so
I'll just shutup now. But it wouldn't be hard if someone wanted
to improve at least the initial clone serving case.
A much better solution would consist of finding just _why_ object
enumeration is so slow. This is indeed my biggest grip with git
performance at the moment.
|nico@xanadu:linux-2.6> time git rev-list --objects --all > /dev/null
|
|real 0m21.742s
|user 0m21.379s
|sys 0m0.360s
That's way too long for 1030198 objects (roughly 48k objects/sec). And
it gets even worse with the gcc repository:
|nico@xanadu:gcc> time git rev-list --objects --all > /dev/null
|
|real 1m51.591s
|user 1m50.757s
|sys 0m0.810s
That's for 1267993 objects, or about 11400 objects/sec.
Clearly something is not scaling here.
What are the different packing options for the two repositories?
A longer deltachain and larger packwindow would increase the
enumeration time, wouldn't it?
--
Andreas Ericsson andreas.ericsson@xxxxxx
OP5 AB www.op5.se
Tel: +46 8-230225 Fax: +46 8-230231
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html