[PATCH] pack-objects: more threaded load balancing fix with often changed paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The code that splits the object list amongst work threads tries to do so
on "path" boundaries not to prevent good delta matches.  However, in 
some cases, a few paths may largely dominate the hash distribution and
it is not possible to have good load balancing without ignoring those 
boundaries.

Signed-off-by: Nicolas Pitre <nico@xxxxxxx>
---

On Mon, 10 Dec 2007, Nicolas Pitre wrote:

> On Mon, 10 Dec 2007, Jon Smirl wrote:
> 
> > On 12/10/07, Nicolas Pitre <nico@xxxxxxx> wrote:
> > 
> > > The hash in the code above has to do with the file names the
> > > corresponding objects are coming from.
> > 
> > So can we change this loop to exit after a max of window_size * 10 or
> > something like that iterations? Without capping it the threads become
> > way unbalanced in the end. In the gcc case one thread is continuing
> > 30+ minutes past the others exiting.
> 
> Indeed, some more tweaking are needed.
> 
> The object path distribution goes like this for the gcc repo:
> 
> 105557  gcc
> 42537   gcc/ChangeLog
> 25210   gcc/config
> 20690   gcc/testsuite
> 13434   gcc/testsuite/ChangeLog
> 12363   libstdc++-v3
> 9346    gcc/cp
> 8757    libstdc++-v3/include
> 8186    gcc/version.c
> 7867    gcc/cp/ChangeLog
> 7737    libstdc++-v3/include/bits
> 7653    libjava
> 6577    gcc/testsuite/gcc.dg
> 5942    libjava/ChangeLog
> 5351    gcc/config/i386
> 5260    gcc/testsuite/g++.dg
> 4451    gcc/f
> 4330    libstdc++-v3/ChangeLog
> 4321    libstdc++-v3/include/bits/c++config
> 4316    gcc/doc
> [...]
> 
> So... the top entries are most certainly going to create load balancing 
> issues if their path hash clustering isn't broken.

Here's the fix.

diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 250dc56..7dd0d7f 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1709,6 +1709,16 @@ static void ll_find_deltas(struct object_entry **list, unsigned list_size,
 				list++;
 				sub_size--;
 			}
+			if (!sub_size) {
+				/*
+				 * It is possible for some "paths" to have
+				 * so many objects that no hash boundary
+				 * might be found.  Let's just steal the
+				 * exact half in that case.
+				 */
+				sub_size = victim->remaining / 2;
+				list -= sub_size;
+			}
 			target->list = list;
 			victim->list_size -= sub_size;
 			victim->remaining -= sub_size;
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux