Re: Partitioned packs

Junio C Hamano <junkio@xxxxxxx> · Tue, 03 Apr 2007 18:58:11 -0700

"Chris Lee" <clee@xxxxxxx> writes:

> I've been running some experiments, as hinted earlier by the
> discussion about just how much git-index-pack sucks (which, really,
> isn't much since the gaping memleak is gone now).
>
> These experiments include trying to see if there's a noticeable
> performance improvement by splitting out objects of different types
> into different packs. So far, it definitely seems to make a
> difference, though not the one I was initially expecting. For all of
> these tests, I did 'sysctl -w vm.drop_caches=3' before running, to
> effectively simulate a cold-cache run.

Are you running on a 64-bit machine or 32-bit?

I wonder what the numbers would be if you partition into the
same number of packs of similar sizes as your experiment, but
partitioning based on not by type but by age or other factors.

What I am getting at is that you may not be seeing the effect of
access pattern based on the type at all.  For example, the
performance can be affected by other factors, such as necessity
to use smaller number of pack_windows per pack.  use_pack()
iterates through the currently active windows on a linked list
per pack, and a window is 32MB on 32-bit machines, so you would
literally need hundreds of them to access that 3GB pack (the
total is limited to 256MB so 8 windows are recycled).  It is
possible that simply using more packs and knowing which pack you
need to access upfront may be cutting down the cost of finding
the pack window to use.  A single pack would have a linked list
of 8 active windows, while two packs would have one linked list
of each, so the average linear search cost would be half.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html