[PATCH v3 00/10] repack: fix geometric repacking with gitalternates

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

this is the third version of my patch series to fix geometric repacking
with repositories that are connected to an alternate object database.

Changes compared to v2:

    - I've simplified patch 1 to reset the preferred pack index instead
      of moving around some of the checks. This also causes us to print
      the warning for missing preferred packfiles again.

    - I've fixed the logic in patch 2 to find the preferred packfile to
      not return a packfile that would be rolled up in a geometric
      repack.

    - I've added an additional patch to split out preexisting tests for
      `--stdin-packs` into their own test file.

    - I've changed the unportable test added for geometric repacking
      with `-l` that used stat(1) to instead use our `test-tool chmtime`
      helper.

    - I've changed the logic that disables writing bitmaps in git-repack
      to cover more cases. It now always kicks in when doing a repack
      with `-l` that asks for bitmaps when connected to an alternate
      object directory.

    - In general, there's a bunch of small improvements left and right
      for the tests I'm adding.

Thanks for all the feedback so far.

Patrick

Patrick Steinhardt (10):
  midx: fix segfault with no packs and invalid preferred pack
  repack: fix trying to use preferred pack in alternates
  repack: fix generating multi-pack-index with only non-local packs
  pack-objects: split out `--stdin-packs` tests into separate file
  pack-objects: fix error when packing same pack twice
  pack-objects: fix error when same packfile is included and excluded
  pack-objects: extend test coverage of `--stdin-packs` with alternates
  t/helper: allow chmtime to print verbosely without modifying mtime
  repack: honor `-l` when calculating pack geometry
  repack: disable writing bitmaps when doing a local repack

 builtin/pack-objects.c        |  10 +-
 builtin/repack.c              |  62 ++++++++-
 midx.c                        |   6 +-
 object-file.c                 |   6 +
 object-store.h                |   1 +
 t/helper/test-chmtime.c       |   2 +-
 t/t5300-pack-object.sh        | 135 -------------------
 t/t5319-multi-pack-index.sh   |  12 ++
 t/t5331-pack-objects-stdin.sh | 240 ++++++++++++++++++++++++++++++++++
 t/t7700-repack.sh             |  17 +++
 t/t7703-repack-geometric.sh   | 165 +++++++++++++++++++++++
 11 files changed, 505 insertions(+), 151 deletions(-)
 create mode 100755 t/t5331-pack-objects-stdin.sh

Range-diff against v2:
 1:  8c384aabde !  1:  dd8145bead midx: fix segfault with no packs and invalid preferred pack
    @@ Commit message
         will cause a segfault as we blindly index into the array of packfiles,
         which would be empty.
     
    -    Fix this bug by exiting early in case we have determined that the MIDX
    -    wouldn't have any packfiles to index. While the request itself does not
    -    make much sense anyway, it is still preferable to exit gracefully than
    -    to abort.
    +    Fix this bug by resetting the preferred packfile index to `-1` before
    +    searching for the preferred pack. This fixes the segfault as we already
    +    check for whether the index is `> - 1`. If it is not, we simply don't
    +    pick a preferred packfile at all.
     
    +    Helped-by: Taylor Blau <me@xxxxxxxxxxxx>
         Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
     
      ## midx.c ##
     @@ midx.c: static int write_midx_internal(const char *object_dir,
    - 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
    - 	stop_progress(&ctx.progress);
    + 	}
      
    -+	if (!ctx.nr) {
    -+		error(_("no pack files to index."));
    -+		result = 1;
    -+		goto cleanup;
    -+	}
    + 	if (preferred_pack_name) {
    +-		int found = 0;
    ++		ctx.preferred_pack_idx = -1;
     +
    - 	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
    - 	    !(packs_to_include || packs_to_drop)) {
    - 		struct bitmap_index *bitmap_git;
    + 		for (i = 0; i < ctx.nr; i++) {
    + 			if (!cmp_idx_or_pack_name(preferred_pack_name,
    + 						  ctx.info[i].pack_name)) {
    + 				ctx.preferred_pack_idx = i;
    +-				found = 1;
    + 				break;
    + 			}
    + 		}
    + 
    +-		if (!found)
    ++		if (ctx.preferred_pack_idx == -1)
    + 			warning(_("unknown preferred pack: '%s'"),
    + 				preferred_pack_name);
    + 	} else if (ctx.nr &&
     
      ## t/t5319-multi-pack-index.sh ##
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'write midx with --stdin-packs' '
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'write midx with --stdin-packs'
     +	test_must_fail git -C empty multi-pack-index write \
     +		--stdin-packs --preferred-pack=does-not-exist </dev/null 2>err &&
     +	cat >expect <<-EOF &&
    ++	warning: unknown preferred pack: ${SQ}does-not-exist${SQ}
     +	error: no pack files to index.
     +	EOF
     +	test_cmp expect err
 2:  5a6c2ead87 !  2:  c5db1bc587 repack: fix trying to use preferred pack in alternates
    @@ Commit message
         While at it, rename the function `get_largest_active_packfile()` to
         `get_preferred_pack()` to better document its intent.
     
    +    Helped-by: Taylor Blau <me@xxxxxxxxxxxx>
         Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
     
      ## builtin/repack.c ##
    @@ builtin/repack.c: static struct packed_git *get_largest_active_pack(struct pack_
      		return NULL;
     -	return geometry->pack[geometry->pack_nr - 1];
     +
    -+	for (i = geometry->pack_nr; i > 0; i--)
    ++	/*
    ++	 * The preferred pack is the largest pack above the split line. In
    ++	 * other words, it is the largest pack that does not get rolled up in
    ++	 * the geometric repack.
    ++	 */
    ++	for (i = geometry->pack_nr; i > geometry->split; i--)
     +		/*
     +		 * A pack that is not local would never be included in a
     +		 * multi-pack index. We thus skip over any non-local packs.
    @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric with pack.packSize
     +	test_must_be_empty err &&
     +
     +	# We should see that a new packfile was generated.
    -+	find shared/.git/objects/pack -type f -name "*.pack" | sort >packs &&
    -+	test $(wc -l <packs) = 1 &&
    ++	find shared/.git/objects/pack -type f -name "*.pack" >packs &&
    ++	test_line_count = 1 packs &&
     +
     +	# We should also see a multi-pack-index. This multi-pack-index should
     +	# never refer to any packfiles in the alternate object database.
    -+	# Consequentially, it should be valid even with the alternate ODB
    -+	# deleted.
    -+	rm -r shared &&
    -+	git -C member multi-pack-index verify
    ++	test -f member/.git/objects/pack/multi-pack-index &&
    ++	test-tool read-midx member/.git/objects >packs.midx &&
    ++	grep "^pack-.*\.idx$" packs.midx | sort >actual &&
    ++	basename member/.git/objects/pack/pack-*.idx >expect &&
    ++	test_cmp expect actual
     +'
     +
      test_done
 3:  f705cd6c49 !  3:  8c3193268f repack: fix generating multi-pack-index with only non-local packs
    @@ Commit message
         Fix the code to skip non-local packfiles.
     
         Co-authored-by: Taylor Blau <me@xxxxxxxxxxxx>
    +    Signed-off-by: Taylor Blau <me@xxxxxxxxxxxx>
         Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
     
      ## builtin/repack.c ##
    @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
     
      ## t/t7703-repack-geometric.sh ##
     @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric --write-midx with packfiles in main and alterna
    - 	git -C member multi-pack-index verify
    + 	test_cmp expect actual
      '
      
     +test_expect_success '--geometric --with-midx with no local objects' '
    @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric --write-midx with
     +	# alternate object database does not invalidate the geometric sequence.
     +	# And we should not have a multi-pack-index because these only index
     +	# local packfiles, and there are none.
    -+	find member/.git/objects/pack -type f >actual &&
    -+	test_must_be_empty actual
    ++	test_dir_is_empty member/$packdir
     +'
     +
      test_done
 -:  ---------- >  4:  8d47d753dc pack-objects: split out `--stdin-packs` tests into separate file
 4:  ff28829589 !  5:  ac528514e7 pack-objects: fix error when packing same pack twice
    @@ builtin/pack-objects.c: static void read_packs_list_from_stdin(void)
      	for (p = get_all_packs(the_repository); p; p = p->next) {
      		const char *pack_name = pack_basename(p);
     
    - ## t/t5331-pack-objects-stdin.sh (new) ##
    -@@
    -+#!/bin/sh
    -+
    -+test_description='pack-objects --stdin'
    -+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
    -+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    -+
    -+TEST_PASSES_SANITIZE_LEAK=true
    -+. ./test-lib.sh
    + ## t/t5331-pack-objects-stdin.sh ##
    +@@ t/t5331-pack-objects-stdin.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    + TEST_PASSES_SANITIZE_LEAK=true
    + . ./test-lib.sh
    + 
    ++packed_objects() {
    ++	git show-index <"$1" >tmp-object-list &&
    ++	cut -d' ' -f2 tmp-object-list | sort &&
    ++	rm tmp-object-list
    ++ }
     +
    + test_expect_success 'setup for --stdin-packs tests' '
    + 	git init stdin-packs &&
    + 	(
    +@@ t/t5331-pack-objects-stdin.sh: test_expect_success '--stdin-packs with broken links' '
    + 	)
    + '
    + 
     +test_expect_success 'pack-objects --stdin with duplicate packfile' '
     +	test_when_finished "rm -fr repo" &&
     +
    @@ t/t5331-pack-objects-stdin.sh (new)
     +		test_commit "commit" &&
     +		git repack -ad &&
     +
    -+		(
    ++		{
     +			basename .git/objects/pack/pack-*.pack &&
     +			basename .git/objects/pack/pack-*.pack
    -+		) >packfiles &&
    ++		} >packfiles &&
     +
     +		git pack-objects --stdin-packs generated-pack <packfiles &&
    -+		test_cmp generated-pack-*.pack .git/objects/pack/pack-*.pack
    ++		packed_objects .git/objects/pack/pack-*.idx >expect &&
    ++		packed_objects generated-pack-*.idx >actual &&
    ++		test_cmp expect actual
     +	)
     +'
     +
    -+test_done
    + test_done
     
      ## t/t7703-repack-geometric.sh ##
     @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric --with-midx with no local objects' '
    - 	test_must_be_empty actual
    + 	test_dir_is_empty member/$packdir
      '
      
     +test_expect_success '--geometric with same pack in main and alternate ODB' '
 5:  8011603d6d !  6:  afd7c7864d pack-objects: fix error when same packfile is included and excluded
    @@ builtin/pack-objects.c: static void read_packs_list_from_stdin(void)
      
     
      ## t/t5331-pack-objects-stdin.sh ##
    -@@ t/t5331-pack-objects-stdin.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    - TEST_PASSES_SANITIZE_LEAK=true
    - . ./test-lib.sh
    - 
    -+packed_objects() {
    -+	git show-index <"$1" >tmp-object-list &&
    -+	cut -d' ' -f2 tmp-object-list &&
    -+	rm tmp-object-list
    -+ }
    -+
    - test_expect_success 'pack-objects --stdin with duplicate packfile' '
    - 	test_when_finished "rm -fr repo" &&
    - 
     @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with duplicate packfile' '
      	)
      '
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with du
     +		test_commit "commit" &&
     +		git repack -ad &&
     +
    -+		(
    ++		{
     +			basename .git/objects/pack/pack-*.pack &&
     +			printf "^%s\n" "$(basename .git/objects/pack/pack-*.pack)"
    -+		) >packfiles &&
    ++		} >packfiles &&
     +
     +		git pack-objects --stdin-packs generated-pack <packfiles &&
     +		packed_objects generated-pack-*.idx >packed-objects &&
 6:  3e958bf2a4 !  7:  cd135439ae pack-objects: extend test coverage of `--stdin-packs` with alternates
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with sa
     +	test_commit -C member "local-commit" &&
     +	git -C member repack -dl &&
     +
    -+	(
    -+		basename shared/.git/objects/pack/pack-*.pack  &&
    ++	{
    ++		basename shared/.git/objects/pack/pack-*.pack &&
     +		basename member/.git/objects/pack/pack-*.pack
    -+	) >packfiles &&
    ++	} >packfiles &&
     +
    -+	(
    ++	{
     +		packed_objects shared/.git/objects/pack/pack-*.idx &&
     +		packed_objects member/.git/objects/pack/pack-*.idx
    -+	) | sort >expected-objects &&
    ++	} | sort >expected-objects &&
     +
     +	git -C member pack-objects --stdin-packs generated-pack <packfiles &&
    -+	packed_objects member/generated-pack-*.idx | sort >actual-objects &&
    ++	packed_objects member/generated-pack-*.idx >actual-objects &&
     +	test_cmp expected-objects actual-objects
     +'
     +
 -:  ---------- >  8:  18bac90289 t/helper: allow chmtime to print verbosely without modifying mtime
 7:  f014872ed4 !  9:  285695deaf repack: honor `-l` when calculating pack geometry
    @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric with same pack in
     +	# packfile and thus arrive at the conclusion that the geometric
     +	# sequence is intact. We thus expect no changes.
     +	#
    -+	# Note that we are using stat(1) to verify idempotence to also verify
    -+	# that the mtime did not change. This is done in order to detect the
    -+	# case where we do repack objects, but the resulting packfile is the
    -+	# same.
    -+	stat member/.git/objects/pack/* >expected-member-packs &&
    ++	# Note that we are tweaking mtimes of the packfiles so that we can
    ++	# verify they did not change. This is done in order to detect the case
    ++	# where we do repack objects, but the resulting packfile is the same.
    ++	test-tool chmtime --verbose =0 member/.git/objects/pack/* >expected-member-packs &&
     +	git -C member repack --geometric=2 -l -d &&
    -+	stat member/.git/objects/pack/* >actual-member-packs &&
    ++	test-tool chmtime --verbose member/.git/objects/pack/* >actual-member-packs &&
     +	test_cmp expected-member-packs actual-member-packs &&
     +
    -+	(
    ++	{
     +		packed_objects shared/.git/objects/pack/pack-*.idx &&
     +		packed_objects member/.git/objects/pack/pack-*.idx
    -+	) | sort >expected-objects &&
    ++	} | sort >expected-objects &&
     +
     +	# On the other hand, when doing a non-local geometric repack we should
     +	# see both packfiles and thus repack them. We expect that the shared
     +	# object database was not changed.
    -+	stat shared/.git/objects/pack/* >expected-shared-packs &&
    ++	test-tool chmtime --verbose =0 shared/.git/objects/pack/* >expected-shared-packs &&
     +	git -C member repack --geometric=2 -d &&
    -+	stat shared/.git/objects/pack/* >actual-shared-packs &&
    ++	test-tool chmtime --verbose shared/.git/objects/pack/* >actual-shared-packs &&
     +	test_cmp expected-shared-packs actual-shared-packs &&
     +
     +	# Furthermore, we expect that the member repository now has a single
 8:  b10ee7fc3d ! 10:  bb0ee0eb3c repack: disable writing bitmaps when doing a local geometric repack
    @@ Metadata
     Author: Patrick Steinhardt <ps@xxxxxx>
     
      ## Commit message ##
    -    repack: disable writing bitmaps when doing a local geometric repack
    +    repack: disable writing bitmaps when doing a local repack
     
         In order to write a bitmap, we need to have full coverage of all objects
         that are about to be packed. In the traditional non-multi-pack-index
    @@ Commit message
         with writing bitmaps when we have multiple packfiles as long as the
         multi-pack-index covers all objects.
     
    -    This is not always the case though. When writing multi-pack-indices in a
    -    repository that is connected to an alternate object directory we may end
    -    up writing a multi-pack-index that only has partial coverage of objects.
    -    The end result is that writing the bitmap will fail:
    +    This is not always the case though. When asked to perform a repack of
    +    local objects, only, then we cannot guarantee to have full coverage of
    +    all objects regardless of whether we do a full repack or a repack with a
    +    multi-pack-index. The end result is that writing the bitmap will fail in
    +    both worlds:
     
             $ git multi-pack-index write --stdin-packs --bitmap <packfiles
             warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    @@ Commit message
     
         Now there are two different ways to fix this. The first one would be to
         amend git-multi-pack-index(1) to disable writing bitmaps when we notice
    -    that we don't have full object coverage. But we ain't really got enough
    -    information there, and seeing that it is a low-level plumbing command it
    -    does not feel like the right place to fix this.
    +    that we don't have full object coverage.
     
    -    We can easily fix it at a higher level in git-repack(1) though. When all
    -    of the following conditions are true:
    +        - We don't have enough information in git-multi-pack-index(1) in
    +          order to tell whether the local repository _should_ have full
    +          coverage. Because even when connected to an alternate object
    +          directory, it may be the case that we still have all objects
    +          around in the main object database.
     
    -        - We are asked to write a multi-pack index with bitmaps.
    +        - git-multi-pack-index(1) is quite a low-level tool. Automatically
    +          disabling functionality that it was asked to provide does not feel
    +          like the right thing to do.
     
    -        - We are asked to only include local objects via `-l`.
    -
    -        - We are connected to an alternate repository that has packfiles.
    -
    -    Then we will override the user's ask and disable writing bitmaps with a
    -    warning. This is similar to what we do in git-pack-objects(1), where we
    -    also disable writing bitmaps in case we omit an object from the pack.
    +    We can easily fix it at a higher level in git-repack(1) though. When
    +    asked to only include local objects via `-l` and when connected to an
    +    alternate object directory then we will override the user's ask and
    +    disable writing bitmaps with a warning. This is similar to what we do in
    +    git-pack-objects(1), where we also disable writing bitmaps in case we
    +    omit an object from the pack.
     
         Signed-off-by: Patrick Steinhardt <ps@xxxxxx>
     
      ## builtin/repack.c ##
     @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix)
    - 	if (pack_kept_objects < 0)
    - 		pack_kept_objects = write_bitmaps > 0 && !write_midx;
    - 
    -+	if (write_midx && write_bitmaps && geometric_factor && po_args.local) {
    -+		struct packed_git *p;
    -+
    -+		for (p = get_all_packs(the_repository); p; p = p->next) {
    -+			if (p->pack_local)
    -+				continue;
    -+
    -+			/*
    -+			 * When asked to do a local repack, but we have
    -+			 * packfiles that are inherited from an alternate, then
    -+			 * we cannot guarantee that the multi-pack-index would
    -+			 * have full coverage of all objects. We thus disable
    -+			 * writing bitmaps in that case.
    -+			 */
    -+			warning(_("disabling bitmap writing, as some objects are not being packed"));
    -+			write_bitmaps = 0;
    -+			break;
    -+		}
    -+	}
    -+
      	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
      		die(_(incremental_bitmap_conflict_error));
      
    ++	if (write_bitmaps && po_args.local && has_alt_odb(the_repository)) {
    ++		/*
    ++		 * When asked to do a local repack, but we have
    ++		 * packfiles that are inherited from an alternate, then
    ++		 * we cannot guarantee that the multi-pack-index would
    ++		 * have full coverage of all objects. We thus disable
    ++		 * writing bitmaps in that case.
    ++		 */
    ++		warning(_("disabling bitmap writing, as some objects are not being packed"));
    ++		write_bitmaps = 0;
    ++	}
    ++
    + 	if (write_midx && write_bitmaps) {
    + 		struct strbuf path = STRBUF_INIT;
    + 
    +
    + ## object-file.c ##
    +@@ object-file.c: void prepare_alt_odb(struct repository *r)
    + 	r->objects->loaded_alternates = 1;
    + }
    + 
    ++int has_alt_odb(struct repository *r)
    ++{
    ++	prepare_alt_odb(r);
    ++	return !!r->objects->odb->next;
    ++}
    ++
    + /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
    + static int freshen_file(const char *fn)
    + {
    +
    + ## object-store.h ##
    +@@ object-store.h: KHASH_INIT(odb_path_map, const char * /* key: odb_path */,
    + 	struct object_directory *, 1, fspathhash, fspatheq)
    + 
    + void prepare_alt_odb(struct repository *r);
    ++int has_alt_odb(struct repository *r);
    + char *compute_alternate_path(const char *path, struct strbuf *err);
    + struct object_directory *find_odb(struct repository *r, const char *obj_dir);
    + typedef int alt_odb_fn(struct object_directory *, void *);
    +
    + ## t/t7700-repack.sh ##
    +@@ t/t7700-repack.sh: test_expect_success SYMLINKS '--local keeps packs when alternate is objectdir '
    + 	test_cmp expect actual
    + '
    + 
    ++test_expect_success '--local disables writing bitmaps when connected to alternate ODB' '
    ++	test_when_finished "rm -rf shared member" &&
    ++
    ++	git init shared &&
    ++	git clone --shared shared member &&
    ++	(
    ++		cd member &&
    ++		test_commit "object" &&
    ++		git repack -Adl --write-bitmap-index 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: disabling bitmap writing, as some objects are not being packed
    ++		EOF
    ++		test_cmp expect err &&
    ++		test ! -f .git/objects/pack-*.bitmap
    ++	)
    ++'
    ++
    + test_expect_success 'packed obs in alt ODB are repacked even when local repo is packless' '
    + 	mkdir alt_objects/pack &&
    + 	mv .git/objects/pack/* alt_objects/pack &&
     
      ## t/t7703-repack-geometric.sh ##
     @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric -l with non-intact geometric sequence across OD
-- 
2.40.0

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux