On Tue, Jan 29, 2013 at 1:19 PM, Jeff King <peff@xxxxxxxx> wrote: > On Tue, Jan 29, 2013 at 07:58:01AM -0800, Junio C Hamano wrote: > >> The point is not about space. Disk is cheap, and it is not making >> it any worse than what happens to your target audience, that is a >> fetch-only repository with only "gc --auto" in it, where nobody >> passes "-f" to "repack" to cause recomputation of delta. >> >> What I was trying to seek was a way to reduce the runtime penalty we >> pay every time we run git in such a repository. >> >> - Object look-up cost will become log2(50*n) from 50*log2(n), which >> is about 50/log2(50) improvement; > > Yes and no. Our heuristic is to look at the last-used pack for an > object. So assuming we have locality of requests, we should quite often > get "lucky" and find the object in the first log2 search. Even if we > don't assume locality, a situation with one large pack and a few small > packs will have the large one as "last used" more often than the others, > and it will also have the looked-for object more often than the others Opening all of those files does impact performance. It depends on how slow your open(2) syscall is. I know on Mac OS X that its not the fastest function we get from the C library. Performing ~40 opens to look through the most recent pack files and finally find the "real" pack that contains that tag you asked `git show` for isn't that quick. Some of us also use Git on filesystems that are network based, and slow compared to local disk Linux ext2/3/4 with gobs of free RAM. > So I can see how it is something we could potentially optimize, but I > could also see it being surprisingly not a big deal. I'd be very > interested to see real measurements, even of something as simple as a > "master index" which can reference multiple packfiles. I actually tried this many many years ago. There are threads in the archive about it. Its slower. We ruled it out. >> - System resource cost we incur by having to keep 50 file >> descriptors open and maintaining 50 mmap windows will reduce by >> 50 fold. > > I wonder how measurable that is (and if it matters on Linux versus less > efficient platforms). It does matter. We know it has a negative impact on JGit even on Linux for example. You don't want 300 packs in a repository. 50 might be tolerable. 300 is not. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html