On Tue, Jan 29, 2013 at 07:58:01AM -0800, Junio C Hamano wrote: > The point is not about space. Disk is cheap, and it is not making > it any worse than what happens to your target audience, that is a > fetch-only repository with only "gc --auto" in it, where nobody > passes "-f" to "repack" to cause recomputation of delta. > > What I was trying to seek was a way to reduce the runtime penalty we > pay every time we run git in such a repository. > > - Object look-up cost will become log2(50*n) from 50*log2(n), which > is about 50/log2(50) improvement; Yes and no. Our heuristic is to look at the last-used pack for an object. So assuming we have locality of requests, we should quite often get "lucky" and find the object in the first log2 search. Even if we don't assume locality, a situation with one large pack and a few small packs will have the large one as "last used" more often than the others, and it will also have the looked-for object more often than the others So I can see how it is something we could potentially optimize, but I could also see it being surprisingly not a big deal. I'd be very interested to see real measurements, even of something as simple as a "master index" which can reference multiple packfiles. > - System resource cost we incur by having to keep 50 file > descriptors open and maintaining 50 mmap windows will reduce by > 50 fold. I wonder how measurable that is (and if it matters on Linux versus less efficient platforms). > > I would be interested to see the timing on how quick it is compared to a > > real repack,... > > Yes, that is what I meant by "wonder if we would be helped by" ;-) There is only one way to find out... :) Maybe I am blessed with nice machines, but I have mostly found the repack process not to be that big a deal these days (especially with threaded delta compression). > > But how do these somewhat mediocre concatenated packs get turned into > > real packs? > > How do they get processed in a fetch-only repositories that > sometimes run "gc --auto" today? By runninng "repack -a -d -f" > occasionally, perhaps? Do we run "repack -adf" regularly? The usual "git gc" procedure will not use "-f", and without that, we will not even consider making deltas between objects that were formerly in different packs (but now are in the same pack). So you are avoiding doing medium-effort packs ("repack -ad") in favor of doing potentially quick packs, but occasionally doing a big-effort pack ("repack -adf"). It may be reasonable advice to "repack -adf" occasionally, but I suspect most people are not doing it regularly (if only because "git gc" does not do it by default). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html