Re: [PATCH 0/2] optimizing pack access on "read only" fetch repos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 29, 2013 at 1:19 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Tue, Jan 29, 2013 at 07:58:01AM -0800, Junio C Hamano wrote:
>
>> The point is not about space.  Disk is cheap, and it is not making
>> it any worse than what happens to your target audience, that is a
>> fetch-only repository with only "gc --auto" in it, where nobody
>> passes "-f" to "repack" to cause recomputation of delta.
>>
>> What I was trying to seek was a way to reduce the runtime penalty we
>> pay every time we run git in such a repository.
>>
>>  - Object look-up cost will become log2(50*n) from 50*log2(n), which
>>    is about 50/log2(50) improvement;
>
> Yes and no. Our heuristic is to look at the last-used pack for an
> object. So assuming we have locality of requests, we should quite often
> get "lucky" and find the object in the first log2 search. Even if we
> don't assume locality, a situation with one large pack and a few small
> packs will have the large one as "last used" more often than the others,
> and it will also have the looked-for object more often than the others

Opening all of those files does impact performance. It depends on how
slow your open(2) syscall is. I know on Mac OS X that its not the
fastest function we get from the C library. Performing ~40 opens to
look through the most recent pack files and finally find the "real"
pack that contains that tag you asked `git show` for isn't that quick.

Some of us also use Git on filesystems that are network based, and
slow compared to local disk Linux ext2/3/4 with gobs of free RAM.

> So I can see how it is something we could potentially optimize, but I
> could also see it being surprisingly not a big deal. I'd be very
> interested to see real measurements, even of something as simple as a
> "master index" which can reference multiple packfiles.

I actually tried this many many years ago. There are threads in the
archive about it. Its slower. We ruled it out.

>>  - System resource cost we incur by having to keep 50 file
>>    descriptors open and maintaining 50 mmap windows will reduce by
>>    50 fold.
>
> I wonder how measurable that is (and if it matters on Linux versus less
> efficient platforms).

It does matter. We know it has a negative impact on JGit even on Linux
for example. You don't want 300 packs in a repository. 50 might be
tolerable. 300 is not.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]