The code that splits the object list amongst work threads tries to do so on "path" boundaries not to prevent good delta matches. However, in some cases, a few paths may largely dominate the hash distribution and it is not possible to have good load balancing without ignoring those boundaries. Signed-off-by: Nicolas Pitre <nico@xxxxxxx> --- On Mon, 10 Dec 2007, Nicolas Pitre wrote: > On Mon, 10 Dec 2007, Jon Smirl wrote: > > > On 12/10/07, Nicolas Pitre <nico@xxxxxxx> wrote: > > > > > The hash in the code above has to do with the file names the > > > corresponding objects are coming from. > > > > So can we change this loop to exit after a max of window_size * 10 or > > something like that iterations? Without capping it the threads become > > way unbalanced in the end. In the gcc case one thread is continuing > > 30+ minutes past the others exiting. > > Indeed, some more tweaking are needed. > > The object path distribution goes like this for the gcc repo: > > 105557 gcc > 42537 gcc/ChangeLog > 25210 gcc/config > 20690 gcc/testsuite > 13434 gcc/testsuite/ChangeLog > 12363 libstdc++-v3 > 9346 gcc/cp > 8757 libstdc++-v3/include > 8186 gcc/version.c > 7867 gcc/cp/ChangeLog > 7737 libstdc++-v3/include/bits > 7653 libjava > 6577 gcc/testsuite/gcc.dg > 5942 libjava/ChangeLog > 5351 gcc/config/i386 > 5260 gcc/testsuite/g++.dg > 4451 gcc/f > 4330 libstdc++-v3/ChangeLog > 4321 libstdc++-v3/include/bits/c++config > 4316 gcc/doc > [...] > > So... the top entries are most certainly going to create load balancing > issues if their path hash clustering isn't broken. Here's the fix. diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c index 250dc56..7dd0d7f 100644 --- a/builtin-pack-objects.c +++ b/builtin-pack-objects.c @@ -1709,6 +1709,16 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size, list++; sub_size--; } + if (!sub_size) { + /* + * It is possible for some "paths" to have + * so many objects that no hash boundary + * might be found. Let's just steal the + * exact half in that case. + */ + sub_size = victim->remaining / 2; + list -= sub_size; + } target->list = list; victim->list_size -= sub_size; victim->remaining -= sub_size; - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html