Re: [PATCH v3] repack: enable bitmaps by default on bare repos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 14 2019, Eric Wong wrote:

> Jeff King <peff@xxxxxxxx> wrote:
>> On Wed, Mar 13, 2019 at 01:51:33AM +0000, Eric Wong wrote:
>>
>> > But I did find Ævar's forgotten gitperformance doc and thread
>> > where the topic was brought up:
>> >
>> >   https://public-inbox.org/git/20170403211644.26814-1-avarab@xxxxxxxxx/
>>
>> One thing that thread reminded me of: we probably also want to default
>> pack.writebitmaphashcache on. Otherwise the time saved during the object
>> enumeration can backfire when we spend much more time trying delta
>> compression (because we don't know the pathnames of any objects).
>
> Interesting...  I found a big improvement with public-inbox
> just using bitmaps; but have never tried the hash cache.
>
>> The reason it defaults to off is for on-disk compatibility with JGit.
>
> Right.  Our documentation seems to indicate JGit just warns (but
> doesn't fall over), so maybe that can be considered separately.
>
> I've never used JGit myself; and was satisfied enough with
> bitmaps alone that I never tried the hash-cache.
>
>> But I have very little experience running without the hash-cache on. We
>> added it very early on because we found performance was not great
>> without it (I don't know if people running JGit have run into the same
>> problem and if not, why not).
>
> As far as serving clones and fetches, public-inbox-init has
> always created bare repos with bitmaps enabled, but without
> the hash-cache for compatibility concerns.
>
> That's a lot of fetches and clones over the years.
>
>> > +test_expect_success 'incremental repack does not complain' '
>> > +	git -C bare.git repack -q 2>repack.err &&
>> > +	! test -s repack.err
>> > +'
>>
>> This last line could use "test_must_be_empty".
>
> Thanks for the review!
>
> ---------8<-----------
> Subject: [PATCH] repack: enable bitmaps by default on bare repos
>
> A typical use case for bare repos is for serving clones and
> fetches to clients.  Enable bitmaps by default on bare repos to
> make it easier for admins to host git repos in a performant way.
>
> Signed-off-by: Eric Wong <e@xxxxxxxxx>
> Helped-by: Jeff King <peff@xxxxxxxx>
> ---
>  Documentation/config/repack.txt |  2 +-
>  builtin/repack.c                |  5 ++++-
>  t/t7700-repack.sh               | 19 ++++++++++++++++++-
>  3 files changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt
> index a5c37813fd..9c413e177e 100644
> --- a/Documentation/config/repack.txt
> +++ b/Documentation/config/repack.txt
> @@ -24,4 +24,4 @@ repack.writeBitmaps::
>  	packs created for clones and fetches, at the cost of some disk
>  	space and extra time spent on the initial repack.  This has
>  	no effect if multiple packfiles are created.
> -	Defaults to false.
> +	Defaults to true on bare repos, false otherwise.
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 67f8978043..caca113927 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -14,7 +14,7 @@
>
>  static int delta_base_offset = 1;
>  static int pack_kept_objects = -1;
> -static int write_bitmaps;
> +static int write_bitmaps = -1;
>  static int use_delta_islands;
>  static char *packdir, *packtmp;
>
> @@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	    (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE)))
>  		die(_("--keep-unreachable and -A are incompatible"));
>
> +	if (write_bitmaps < 0)
> +		write_bitmaps = (pack_everything & ALL_INTO_ONE) &&
> +				 is_bare_repository();
>  	if (pack_kept_objects < 0)
>  		pack_kept_objects = write_bitmaps;
>
> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 6162e2a8e6..86d05160a3 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' '
>  	)
>  '
>
> -test_done
> +test_expect_success 'bitmaps are created by default in bare repos' '
> +	git clone --bare .git bare.git &&
> +	git -C bare.git repack -ad &&
> +	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
> +	test_path_is_file "$bitmap"
> +'
> +
> +test_expect_success 'incremental repack does not complain' '
> +	git -C bare.git repack -q 2>repack.err &&
> +	test_must_be_empty repack.err
> +'
>
> +test_expect_success 'bitmaps can be disabled on bare repos' '
> +	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
> +	bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) &&
> +	test -z "$bitmap"
> +'
> +
> +test_done

I've found a case where turning bitmaps on does horrible things for
bitmap "push" performance.

As it turns out it's not related to this patch per-se, because I had a
*.bitmap for other reasons, but replying to this because we'd presumably
get the same thing in the bare repo case once this merges down.

I can't share the repo, but I had a report where just a "git push" of a
topic branch that was 2/58 ahead/behind took ~2 minutes just in
"Enumerating objects", but ~500ms without bitmaps.

Using a horrible "print to stderr"[1] monkeypatch I'd get without
bitmaps and reported by trace2 / ts:

    Apr 09 16:45:15 16:45:15.365817 git.c:433                         | d1 | main                     | cmd_name     |     |           |           |            | pack-objects (push/pack-objects)
    Apr 09 16:45:15 16:45:15.366220 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000928 |           | pack-objects | label:enumerate-objects
    Apr 09 16:45:15 16:45:15.366241 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000950 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:45:15 16:45:15.366384 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001091 |  0.000141 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:45:15 16:45:15.366394 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001102 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:45:15 get obj list 1
    Apr 09 16:45:15 get obj list 2, did 29391 lines
    Apr 09 16:45:15 get obj list 3
    Apr 09 16:45:15 get obj list 4
    Apr 09 16:45:15 get obj list 5
    Apr 09 16:45:15 get obj list 6
    Apr 09 16:45:15 get obj list 7
    Apr 09 16:45:15 get obj list 8
    Apr 09 16:45:15 get obj list 9
    Apr 09 16:45:15 16:45:15.776559 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  |  0.411263 |  0.410161 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:45:15 16:45:15.776577 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  |  0.411285 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:45:15 16:45:15.776584 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  |  0.411292 |  0.000007 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:45:15 16:45:15.776605 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  |  0.411313 |  0.410385 | pack-objects | label:enumerate-objects
    Apr 09 16:45:15 16:45:15.776609 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  |  0.411318 |           | pack-objects | label:write-pack-file
    Apr 09 16:45:15 16:45:15.794235 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  |  0.428942 |  0.017624 | pack-objects | label:write-pack-file

But with pack.useBitmaps=true:

    Apr 09 16:39:59 16:39:59.139022 git.c:433                         | d1 | main                     | cmd_name     |     |           |           |            | pack-objects (push/pack-objects)
    Apr 09 16:39:59 16:39:59.139398 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000869 |           | pack-objects | label:enumerate-objects
    Apr 09 16:39:59 16:39:59.139419 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000892 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:39:59 16:39:59.139551 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001023 |  0.000131 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 16:39:59 16:39:59.139559 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001032 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:39:59 get obj list 1
    Apr 09 16:39:59 get obj list 2, did 29392 lines
    Apr 09 16:39:59 get obj list 3
    Apr 09 16:39:59 prepping walk
    Apr 09 16:39:59 opening packed bitmap...
    Apr 09 16:39:59 opening packed bitmap done
    Apr 09 16:39:59 walking 29392 pending
    Apr 09 16:39:59 done walking 29392 pending
    Apr 09 16:39:59 prepare_bitmap_walk 3
    Apr 09 16:39:59 prepare_bitmap_walk 4
    Apr 09 16:39:59 prepare_bitmap_walk 5
    Apr 09 16:40:00 prepare_bitmap_walk 6
    Apr 09 16:40:00 prepare_bitmap_walk 6.1
    Apr 09 16:41:35 prepare_bitmap_walk 6.2
    Apr 09 16:41:35 prepare_bitmap_walk 7
    Apr 09 16:41:52 prepare_bitmap_walk 8
    Apr 09 16:41:52 walking?
    Apr 09 16:41:52 traversing
    Apr 09 16:41:52 traversing done
    Apr 09 16:41:52 16:41:52.091634 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  | 112.953099 | 112.952067 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 16:41:52 16:41:52.091655 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  | 112.953128 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:41:52 16:41:52.091668 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  | 112.953141 |  0.000013 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 16:41:52 16:41:52.091700 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  | 112.953172 | 112.952303 | pack-objects | label:enumerate-objects
    Apr 09 16:41:52 16:41:52.091706 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  | 112.953179 |           | pack-objects | label:write-pack-file
    Apr 09 16:41:52 16:41:52.111966 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  | 112.973438 |  0.020259 | pack-objects | label:write-pack-file

I.e. almost all the time is in get_object_list_from_bitmap() and around
1m30s in just this in pack-bitmap.c:

    haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);

And then another ~20s in:

    wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);

This is with 10 packs and where only the largest (initial clone pack)
had a *.bitmap, but I can also reproduce with a 'git repack -A -d -b',
i.e. with only one pack with a *.bitmap, although that makes it a bit
better for the first bit, and almost completely cuts down on the time
spent in the second phase:

    Apr 09 17:08:37 17:08:37.261507 builtin/pack-objects.c:3493       | d1 | main                     | region_enter | r1  |  0.000922 |           | pack-objects | label:enumerate-objects
    Apr 09 17:08:37 17:08:37.261527 builtin/pack-objects.c:3495       | d1 | main                     | region_enter | r1  |  0.000943 |           | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 17:08:37 17:08:37.261600 builtin/pack-objects.c:3498       | d1 | main                     | region_leave | r1  |  0.001015 |  0.000072 | pack-objects | ..label:enumerate-objects-prepare-packing-data
    Apr 09 17:08:37 17:08:37.261608 builtin/pack-objects.c:3510       | d1 | main                     | region_enter | r1  |  0.001024 |           | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 17:08:37 get obj list 1
    Apr 09 17:08:37 get obj list 2, did 29380 lines
    Apr 09 17:08:37 get obj list 3
    Apr 09 17:08:37 prepping walk
    Apr 09 17:08:37 opening packed bitmap...
    Apr 09 17:08:37 opening packed bitmap done
    Apr 09 17:08:37 walking 29380 pending
    Apr 09 17:08:37 done walking 29380 pending
    Apr 09 17:08:37 prepare_bitmap_walk 3
    Apr 09 17:08:37 prepare_bitmap_walk 4
    Apr 09 17:08:37 prepare_bitmap_walk 5
    Apr 09 17:08:38 prepare_bitmap_walk 6
    Apr 09 17:08:38 prepare_bitmap_walk 6.1
    Apr 09 17:09:07 prepare_bitmap_walk 6.2
    Apr 09 17:09:07 prepare_bitmap_walk 7
    Apr 09 17:09:09 prepare_bitmap_walk 8
    Apr 09 17:09:09 walking?
    Apr 09 17:09:09 traversing
    Apr 09 17:09:09 traversing done
    Apr 09 17:09:09 17:09:09.229185 builtin/pack-objects.c:3514       | d1 | main                     | region_leave | r1  | 31.968595 | 31.967571 | pack-objects | ..label:enumerate-objects-get-obj-list
    Apr 09 17:09:09 17:09:09.229203 builtin/pack-objects.c:3517       | d1 | main                     | region_enter | r1  | 31.968619 |           | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 17:09:09 17:09:09.229214 builtin/pack-objects.c:3520       | d1 | main                     | region_leave | r1  | 31.968630 |  0.000011 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base
    Apr 09 17:09:09 17:09:09.229242 builtin/pack-objects.c:3530       | d1 | main                     | region_leave | r1  | 31.968658 | 31.967736 | pack-objects | label:enumerate-objects
    Apr 09 17:09:09 17:09:09.229265 builtin/pack-objects.c:3542       | d1 | main                     | region_enter | r1  | 31.968681 |           | pack-objects | label:write-pack-file
    Apr 09 17:09:09 17:09:09.251998 builtin/pack-objects.c:3544       | d1 | main                     | region_leave | r1  | 31.991412 |  0.022731 | pack-objects | label:write-pack-file

I don't have time to dig more into this now, just wanted to send these
initial results...

1.



    diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
    index a154fc29f6..8b2af1740e 100644
    --- a/builtin/pack-objects.c
    +++ b/builtin/pack-objects.c
    @@ -3052,2 +3052,3 @@ static int get_object_list_from_bitmap(struct rev_info *revs)
     {
    +	fprintf(stderr, "prepping walk\n");
     	if (!(bitmap_git = prepare_bitmap_walk(revs)))
    @@ -3055,2 +3056,3 @@ static int get_object_list_from_bitmap(struct rev_info *revs)

    +	fprintf(stderr, "walking?\n");
     	if (pack_options_allow_reuse() &&
    @@ -3066,3 +3068,5 @@ static int get_object_list_from_bitmap(struct rev_info *revs)

    +	fprintf(stderr, "traversing\n");
     	traverse_bitmap_commit_list(bitmap_git, &add_object_entry_from_bitmap);
    +	fprintf(stderr, "traversing done\n");
     	return 0;
    @@ -3091,2 +3095,3 @@ static void get_object_list(int ac, const char **av)
     	int save_warning;
    +	int lns = 0;

    @@ -3102,3 +3107,5 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 1\n");
     	while (fgets(line, sizeof(line), stdin) != NULL) {
    +		lns++;
     		int len = strlen(line);
    @@ -3128,4 +3135,6 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 2, did %d lines\n", lns);
     	warn_on_object_refname_ambiguity = save_warning;

    +	fprintf(stderr, "get obj list 3\n");
     	if (use_bitmap_index && !get_object_list_from_bitmap(&revs))
    @@ -3133,2 +3142,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 4\n");
     	if (use_delta_islands)
    @@ -3136,2 +3146,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 5\n");
     	if (prepare_revision_walk(&revs))
    @@ -3140,2 +3151,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 6\n");
     	if (!fn_show_object)
    @@ -3146,2 +3158,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 7\n");
     	if (unpack_unreachable_expiration) {
    @@ -3157,2 +3170,3 @@ static void get_object_list(int ac, const char **av)

    +	fprintf(stderr, "get obj list 8\n");
     	if (keep_unreachable)
    @@ -3163,2 +3177,3 @@ static void get_object_list(int ac, const char **av)
     		loosen_unused_packed_objects(&revs);
    +	fprintf(stderr, "get obj list 9\n");

    @@ -3478,3 +3493,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
     			    the_repository);
    +	trace2_region_enter("pack-objects", "enumerate-objects-prepare-packing-data",
    +			    the_repository);
     	prepare_packing_data(the_repository, &to_pack);
    +	trace2_region_leave("pack-objects", "enumerate-objects-prepare-packing-data",
    +			    the_repository);

    @@ -3482,11 +3501,28 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
     		progress_state = start_progress(_("Enumerating objects"), 0);
    -	if (!use_internal_rev_list)
    +	if (!use_internal_rev_list) {
    +		trace2_region_enter("pack-objects", "enumerate-objects-read-stdin",
    +				    the_repository);
     		read_object_list_from_stdin();
    -	else {
    +		trace2_region_leave("pack-objects", "enumerate-objects-read-stdin",
    +				    the_repository);
    +	} else {
    +		trace2_region_enter("pack-objects", "enumerate-objects-get-obj-list",
    +				    the_repository);
     		get_object_list(rp.argc, rp.argv);
     		argv_array_clear(&rp);
    +		trace2_region_leave("pack-objects", "enumerate-objects-get-obj-list",
    +				    the_repository);
     	}
    +	trace2_region_enter("pack-objects", "enumerate-objects-cleanup-preferred-base",
    +			    the_repository);
     	cleanup_preferred_base();
    -	if (include_tag && nr_result)
    +	trace2_region_leave("pack-objects", "enumerate-objects-cleanup-preferred-base",
    +			    the_repository);
    +	if (include_tag && nr_result) {
    +		trace2_region_enter("pack-objects", "enumerate-objects-add-tags",
    +				    the_repository);
     		for_each_ref(add_ref_tag, NULL);
    +		trace2_region_leave("pack-objects", "enumerate-objects-add-tags",
    +				    the_repository);
    +	}
     	stop_progress(&progress_state);
    diff --git a/pack-bitmap.c b/pack-bitmap.c
    index 4695aaf6b4..0ab71597fd 100644
    --- a/pack-bitmap.c
    +++ b/pack-bitmap.c
    @@ -693,5 +693,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 * because we may not need to use it */
    +	fprintf(stderr, "opening packed bitmap...\n");
     	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
     		goto cleanup;
    +	fprintf(stderr, "opening packed bitmap done\n");

    +	fprintf(stderr, "walking %d pending\n", revs->pending.nr);
     	for (i = 0; i < revs->pending.nr; ++i) {
    @@ -720,2 +723,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	}
    +	fprintf(stderr, "done walking %d pending\n", revs->pending.nr);

    @@ -726,2 +730,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 */
    +	fprintf(stderr, "prepare_bitmap_walk 3\n");
     	if (haves && !in_bitmapped_pack(bitmap_git, haves))
    @@ -729,2 +734,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)

    +	fprintf(stderr, "prepare_bitmap_walk 4\n");
     	/* if we don't want anything, we're done here */
    @@ -738,2 +744,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	 */
    +	fprintf(stderr, "prepare_bitmap_walk 5\n");
     	if (load_pack_bitmap(bitmap_git) < 0)
    @@ -741,2 +748,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)

    +	fprintf(stderr, "prepare_bitmap_walk 6\n");
     	object_array_clear(&revs->pending);
    @@ -745,3 +753,5 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     		revs->ignore_missing_links = 1;
    + 		fprintf(stderr, "prepare_bitmap_walk 6.1\n");
     		haves_bitmap = find_objects(bitmap_git, revs, haves, NULL);
    + 		fprintf(stderr, "prepare_bitmap_walk 6.2\n");
     		reset_revision_walk();
    @@ -752,4 +762,6 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
     	}
    +	fprintf(stderr, "prepare_bitmap_walk 7\n");

     	wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap);
    +	fprintf(stderr, "prepare_bitmap_walk 8\n");

    diff --git a/revision.c b/revision.c
    index eb8e51bc63..4592d01ee7 100644
    --- a/revision.c
    +++ b/revision.c
    @@ -63,2 +63,4 @@ static void mark_tree_contents_uninteresting(struct repository *r,

    +	fprintf(stderr, "MTCU\n");
    +
     	if (parse_tree_gently(tree, 1) < 0)
    @@ -167,2 +169,4 @@ static void add_children_by_path(struct repository *r,

    +	fprintf(stderr, "ACBP\n");
    +
     	if (!tree)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux