Re: [PATCH] lookup_object(): Speed up 'git gc' by 12%, by reducing hash chain length

Ingo Molnar <mingo@xxxxxxx> · Fri, 29 Apr 2011 08:22:34 +0200

Got no feedback for this patch - might have been drowned in the hashcmp 
discussion :)

I tried various levels of object hash spreading factor and 16x seemed like a 
good one, but i can try a different patch as well if there's other ideas to fix 
this source of CPU utilization inefficiency.

Thanks,

	Ingo

* Ingo Molnar <mingo@xxxxxxx> wrote:

> I was looking at the 'git gc' stalled-cycles profile and noticed that 
> lookup_object() was the top entry:
> 
>  aldebaran:~/git> perf record -e stalled-cycles -F 10000 ./git gc 
>  Counting objects: 32459, done .
>  Delta compression using up to 16 threads.
>  Compressing objects: 100% (8161/8161), done.
>  Writing objects: 100% (32459/32459), done.
>  Total 32459 (delta 24077), reused 32459 (delta 24077)
>  [ perf record: Woken up 5 times to write data ]
>  [ perf record: Captured and wrote 1.199 MB perf.data (~52366 samples) ]
> 
>  aldebaran:~/git> perf report | head
>  # Events: 30K stalled-cycles
>  #
>  # Overhead     Command          Shared Object                                     Symbol
>  # ........  ..........  .....................  .........................................
>  #
>     36.36%         git  git                    [.] lookup_object
>      6.55%         git  git                    [.] find_pack_entry_one
>      6.53%         git  libz.so.1.2.5          [.] 0xc416          
>      5.94%         git  libz.so.1.2.5          [.] inflate
>      4.12%         git  [kernel.kallsyms]      [k] do_raw_spin_lock
> 
> Annotated output showed the culprit:
> 
>          :              if (!obj_hash)
>          :                      return NULL;
>          :
>          :              i = hashtable_index(sha1);
>          :              while ((obj = obj_hash[i]) != NULL) {
>     4.13 :        498316:       eb 1f                   jmp    498337 <lookup_object+0x47>
>     0.00 :        498318:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
>     0.00 :        49831f:       00 
>          :                      if (!hashcmp(sha1, obj->sha1))
>     1.48 :        498320:       48 8d 78 04             lea    0x4(%rax),%rdi
>     0.02 :        498324:       4c 89 d6                mov    %r10,%rsi
>     0.00 :        498327:       4c 89 d9                mov    %r11,%rcx
>    26.12 :        49832a:       f3 a6                   repz cmpsb %es:(%rdi),%ds:(%rsi)
>    17.12 :        49832c:       74 14                   je     498342 <lookup_object+0x52>
>          :                              break;
>          :                      i++;
>     6.88 :        49832e:       83 c2 01                add    $0x1,%edx
>          :                      if (i == obj_hash_size)
>          :                              i = 0;
>     2.28 :        498331:       44 39 ca                cmp    %r9d,%edx
>     0.24 :        498334:       0f 44 d3                cmove  %ebx,%edx
> 
> "perf stat --detailed" shows us the following picture:
> 
>  Performance counter stats for './git gc':
> 
>        3145.596314 task-clock               #    0.877 CPUs utilized          
>              1,760 context-switches         #    0.001 M/sec                  
>                174 CPU-migrations           #    0.000 M/sec                  
>             41,509 page-faults              #    0.013 M/sec                  
>      9,753,859,587 cycles                   #    3.101 GHz                      (22.91%)
>      2,555,944,921 stalled-cycles           #   26.20% of all cycles are idle   (33.89%)
>      8,976,468,086 instructions             #    0.92  insns per cycle        
>                                             #    0.28  stalled cycles per insn  (44.83%)
>      1,782,743,476 branches                 #  566.743 M/sec                    (55.70%)
>         85,045,367 branch-misses            #    4.77% of all branches          (66.54%)
>      1,982,452,996 L1-dcache-loads          #  630.231 M/sec                    (66.18%)
>        152,320,833 L1-dcache-load-misses    #    7.68% of all L1-dcache hits    (55.50%)
>         43,358,073 LLC-loads                #   13.784 M/sec                    (45.33%)
>          2,636,774 LLC-load-misses          #    0.838 M/sec                    (11.50%)
> 
>         3.586922714  seconds time elapsed
> 
> ... so git gc is still fitting into the L1 cache mostly, and it rarely falls 
> out of the L2 cache. So CPU execution is stalling processing longish hash 
> chains and comparing sha1's.
> 
> So i tried the quick patch below, which just increases the object hash size 
> more aggressively, to 16x of the object count, not the previous 2x sizing.
> 
> The results are (run against the Git repo itself, on ec014ea ("Git 1.7.5")):
> 
>  #
>  # Before:
>  #
> 
>  Performance counter stats for './git gc' (10 runs):
> 
>        3147.437358 task-clock               #    0.793 CPUs utilized            ( +-  0.18% )
>              1,753 context-switches         #    0.001 M/sec                    ( +-  3.09% )
>                165 CPU-migrations           #    0.000 M/sec                    ( +-  2.86% )
>             42,587 page-faults              #    0.014 M/sec                    ( +-  0.04% )
>     10,041,078,653 cycles                   #    3.190 GHz                      ( +-  0.18% )
>      2,613,923,719 stalled-cycles           #   26.03% of all cycles are idle   ( +-  0.45% )
>      9,110,524,009 instructions             #    0.91  insns per cycle        
>                                             #    0.29  stalled cycles per insn  ( +-  0.03% )
>      1,796,732,369 branches                 #  570.856 M/sec                    ( +-  0.04% )
>         84,828,313 branch-misses            #    4.72% of all branches          ( +-  0.06% )
> 
>         3.971525714  seconds time elapsed  ( +-  8.56% )
> 
>  #
>  # After:
>  #
> 
>  Performance counter stats for './git gc' (10 runs):
> 
>        2805.034899 task-clock               #    0.757 CPUs utilized            ( +-  0.16% )
>              1,709 context-switches         #    0.001 M/sec                    ( +-  2.51% )
>                169 CPU-migrations           #    0.000 M/sec                    ( +-  1.73% )
>             42,963 page-faults              #    0.015 M/sec                    ( +-  0.23% )
>      8,944,314,899 cycles                   #    3.189 GHz                      ( +-  0.17% )
>      2,118,720,399 stalled-cycles           #   23.69% of all cycles are idle   ( +-  0.52% )
>      9,017,027,059 instructions             #    1.01  insns per cycle        
>                                             #    0.23  stalled cycles per insn  ( +-  0.03% )
>      1,780,388,097 branches                 #  634.712 M/sec                    ( +-  0.04% )
>         76,104,907 branch-misses            #    4.27% of all branches          ( +-  0.07% )
> 
>         3.707549437  seconds time elapsed  ( +-  7.13% )
> 
> The takeaway is that stalled cycles dropped by 23%:
> 
>      2,613,923,719 stalled-cycles           #   26.03% of all cycles are idle   ( +-  0.45% )
>      2,118,720,399 stalled-cycles           #   23.69% of all cycles are idle   ( +-  0.52% )
> 
> And total runtime (measured in cycles) decreased by 12.2%:
> 
>     10,041,078,653 cycles                   #    3.190 GHz                      ( +-  0.18% )
>      8,944,314,899 cycles                   #    3.189 GHz                      ( +-  0.17% )
> 
> Elapsed time dropped as well as expected, but measurement noise [last column] 
> is high there, due to IO effects.
> 
> Cache misses are down as well:
> 
>  # Before:
> 
>      1,982,452,996 L1-dcache-loads          #  630.231 M/sec                    (66.18%)
>        152,320,833 L1-dcache-load-misses    #    7.68% of all L1-dcache hits    (55.50%)
>         43,358,073 LLC-loads                #   13.784 M/sec                    (45.33%)
>          2,636,774 LLC-load-misses          #    0.838 M/sec                    (11.50%)
> 
>  # After:
> 
>      1,946,597,133 L1-dcache-loads          #  687.078 M/sec                    (67.61%)
>        125,634,149 L1-dcache-load-misses    #    6.45% of all L1-dcache hits    (56.38%)
>         30,778,323 LLC-loads                #   10.864 M/sec                    (44.40%)
>          2,697,827 LLC-load-misses          #    0.952 M/sec                    (10.94%)
> 
> This is somewhat surprising, the hash table got larger after all. Cachemisses 
> went down probably due to less chain-walking reducing the effective working set 
> size of git gc.
> 
> So oversizing the hash seems to works well for git gc. I tried a 4x and 8x 
> oversizing as well, it was an improvement but 16x clearly beats it. Larger
> than 16x seems like overkill.
> 
> I guess the hash function itself is as good as it gets:
> 
>  static unsigned int hashtable_index(const unsigned char *sha1)
>  {
>          unsigned int i;
>          memcpy(&i, sha1, sizeof(unsigned int));
>          return i % obj_hash_size;
>  }
> 
> as sha1's ought to be fairly well distributed. The problem i suspect is that 
> even with perfectly random distribution of the hash, there will always be 
> second, third and higher order chains, which get more and more expensive to 
> walk. So a 2x sized hash table becomes overcrowded.
> 
> Thanks,
> 
> 	Ingo
> 
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> 
> diff --git a/object.c b/object.c
> index 7e1f2bb..b3fe485 100644
> --- a/object.c
> +++ b/object.c
> @@ -91,7 +91,7 @@ struct object *lookup_object(const unsigned char *sha1)
>  static void grow_object_hash(void)
>  {
>  	int i;
> -	int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
> +	int new_hash_size = obj_hash_size < 32 ? 32 : 16 * obj_hash_size;
>  	struct object **new_hash;
>  
>  	new_hash = xcalloc(new_hash_size, sizeof(struct object *));
> @@ -116,7 +116,7 @@ void *create_object(const unsigned char *sha1, int type, void *o)
>  	obj->flags = 0;
>  	hashcpy(obj->sha1, sha1);
>  
> -	if (obj_hash_size - 1 <= nr_objs * 2)
> +	if (obj_hash_size - 1 <= nr_objs * 16)
>  		grow_object_hash();
>  
>  	insert_obj_hash(obj, obj_hash, obj_hash_size);

-- 
Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html