Re: [PATCH] packfile.c: speed up loading lots of packfiles.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 03, 2019 at 08:04:15AM -0800, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > Good catch. The issue is that we only add entries to the hashmap in
> > prepare_packed_git(), but they may be added to the pack list by other
> > callers of install_packed_git(). It probably makes sense to just push
> > the hashmap maintenance down into that function, like below.
> 
> Makes sense to me.
> 
> Let me locally squash your fix in and credit you with helped-by
> footer in the amended log message.  Strictly speaking, this may
> invalidate the perf numbers, but I do not think the scenario p5303
> sets up alone is all that interesting anyway---if you have 10,000
> packs, not just registering them (which is improved with this patch)
> but using objects from them would be slower than necessary X-<.

Thanks, that sounds good.

I actually re-checked the perf numbers (mostly to make sure I didn't
screw anything up) and got similar results.

I agree that 10,000 packs is ridiculous, but we do see it (and worse)
occasionally from people pushing in a loop before our scheduled
maintenance kicks in.  And it's quadratic, so if you hit 30k packs, it's
another factor of 9 worse. It makes even diagnosing the repository
pretty painful. :)

Also a fun fact: Linux actually has a limit on the number of
simultaneous mmaps that a process can have, which defaults to ~64k. But
if you have if you have 32k packs, then we'll map both the packs and the
idx files. Plus whatever you need for mapping the binary, libraries,
etc, plus any maps opened by malloc() for large requests.

I have occasionally run into this trying to repack some very
out-of-control cases (the magic fix is to double the limit with `sysctl
-w vm.max_map_count=131060`, if you are curious). I also wondered if
this might be made worse by the recent change to drop
release_pack_memory(). But I ran into it even before that change,
because zlib calls malloc() directly. We're also pretty aggressive about
dying when mmap() returns an error (rather than closing packs and trying
again).

I think Git _could_ be handle this more gracefully by just trying to
keep fewer packs open at one time (the way we similarly try not to use
up all of the file descriptors). But I have a hard time caring too much,
since it's such a ridiculous situation in the first place. Bumping the
limits is an easy operational fix.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux