Re: [PATCH 2/2] pack-objects: optimize "recency order"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I <gitster@xxxxxxxxx> wrote:

>  - In the original recency order, trees and blobs are intermixed. Write
>    trees together before blobs, in the hope that this will improve
>    locality when running pathspec-limited revision traversal, i.e.
>    "git log paths...";
>
>  - When writing blob objects out, write the whole family of blobs that use
>    the same delta base object together, by starting from the root of the
>    delta chain, and writing its immediate children in a width-first
>    manner, in the hope that this will again improve locality when reading
>    blobs that belong to the same path, which are likely to be deltified
>    against each other.

An interesting tangent.

With a single-liner change to apply the same "write the entire family as a
group" logic also to tree objects, we get a superb improvement to the
"index-pack" workload, while "log pathspec" becomes measurably worse.

>  * Pathspec-limited log.
>    $ git log drivers/net >/dev/null
>                                   v1.7.6  with patch
> 	 60.0% percentile :        688,671     443,578
> 	 70.0% percentile :    319,574,732 110,370,100
> 	 99.0% percentile :    412,942,470 133,078,115
> 	 99.5% percentile :    413,172,266 133,163,349
> 	 99.9% percentile :    413,354,356 133,240,445
>        Less than 2MiB seek:       61.71%      62.87%

     60.00% percentile :    2,240,841
     70.00% percentile :  115,729,538
     99.00% percentile :  132,912,877
     99.90% percentile :  133,129,574
   Less than 2MiB seek :       59.77%

This of course is because trees tend to delta well against each other and
writing out everything in the same family tends to coalesce ancient and
recent trees close together, while the history traversal wants to see the
trees at the same path in adjacent commits, and the process to jump over
to the "next tree" in the same commit need to seek a lot more.

>  * Blame.
>    $ git blame drivers/net/ne.c >/dev/null
>                                   v1.7.6  with patch
> 	 50.0% percentile :         11,248       7,334
> 	 95.0% percentile :    408,230,673 132,174,308
>        Less than 2MiB seek:       56.47%      56.83%

     50.00% percentile :       79,182
     95.00% percentile :  132,547,959
   Less than 2MiB seek :       56.16%

This also is harmed by trees in adjacent commits being further apart, but
blame needs to read a lot of blobs as well, so the effect is not as grave
as tree-only "log pathspec" case. 

>  * Index pack.
>    $ git index-pack -v .git/objects/pack/pack*.pack
>                                   v1.7.6  with patch
> 	 80.0% percentile :      1,317,862       4,971
> 	 90.0% percentile :     11,926,385     119,398
> 	 99.9% percentile :    382,919,785  33,155,191
>        Less than 2MiB seek:       81.73%      96.92%

     80.00% percentile :      745
     90.00% percentile :    1,891
     99.90% percentile :   80,001
   Less than 2MiB seek :  100.00%

It is so obvious that the in pack object ordering with "the whole family
at once" tweak is extremely tuned for this workload that the result is not
even funny.

The moral of the story is that index-pack is not a workload to optimize
for ;-) and the following is a wrong patch to apply. In general, we have
to be extremely careful not to tune for a particular workload while
unknowingly sacrificing other workloads.

I wonder if we would get a better locality for "blame" by not writing the
whole family at once for blobs, but only a smaller subset.

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 27132bb..adcb509 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -553,7 +553,7 @@ static struct object_entry **compute_write_order(void)
 	for (i = 0; i < nr_objects; i++) {
 		if (objects[i].type != OBJ_TREE)
 			continue;
-		add_to_write_order(wo, &wo_end, &objects[i]);
+		add_family_to_write_order(wo, &wo_end, &objects[i]);
 	}
 
 	/*
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]