Re: [PATCH v3 0/3] Use default values from settings instead of config

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 04 2021, Glen Choo wrote:

> Hi everyone! This patch was created in response to something we observed in
> Google, where fsck failed to detect that the commit graph was invalid. We
> initially assumed that fsck never checked the commit graph, but it turns out
> that it does so only when core.commitgraph is set, even though we set defaults
> for "whether to use the commit graph" in the repository settings.
>
> Instead of using the config, let's use repository settings where
> available. Replace core.commitGraph and core.multiPackIndex with their
> equivalent repository settings in fsck and gc.
>
> I don't anticipate any more business logic changes and the previous comments
> were focused on testing style, so hopefully this should be good to merge.
>
> Changes since v2:
> * Various small test fixes (thanks Eric!). Most notably, I've used -c instead of
>   test_config because test_config can affect the values in subsequent tests.
> * Rewording fix in patch 3 commit message
> * Refactor tests in patch 3 so that we use a single helper function instead of
>   copy-pasted code
>
> Changes since v1:
> * clean up typo in patch 1 commit message 
> * document the commits that patches 1 and 2 address
> * use test helpers in patch 1
> * rewrite patch 2's tests so that it's easier to tell that each test
>   does something different
> * reword patch 3 commit message to explain the bug
> * add tests to patch 3
>
> Glen Choo (3):
>   fsck: verify commit graph when implicitly enabled
>   fsck: verify multi-pack-index when implictly enabled
>   gc: perform incremental repack when implictly enabled

I've looked this over carefully, it LGTM. I noticed some bugs in this
area, but they pre-date your commits. I've pushed a
"gc/use-repo-settings" to https://github.com/avar/git.git that you can
check out, consider that branch partially as commentary, but feel free
to pick/squash anything you find there.

The range-diff for that is quoted below. Notes:

 * This is somewhat of a case of throwing rocks from a glass house, but
   I think the commit messages could really be shorter/get to the
   point. I.e. say right away what's being changed and why, instead of
   giving the reader an overview of the whole control flow.

   I just adjusted adusted 1/3 in that direction below, I think 2/3
   could do with being squashed into it & adjusted, it's really the
   exact same bug/improvement, just with a different config key that
   behaves the same.

 * The "test_unconfig" in your 2/3 is redundant, so is the "rm -rf" of
   directories that can't exist by that point in 3/3.

 * I wondered if we could just call prepare_repo_settings() in one place
   earlier in gc/maintenance instead of adding yet another call deeper
   in the control flow. I've got a 4/3 below that tries to do that, and
   (as noted in its commit message) intentionally breaks "gc.c" in a way
   that our tests don't catch, which is perhaps fallout from
   31b1de6a09b?

   So that's very much on the topic of extra improvements, i.e. your
   patches don't break that anymore than it is now, but I for one would
   very much like to see an end-state where we're confident that
   gc/maintenance do the right thing on each of the tri-states of these
   boolean variables.

   It's also a point about how to write reliable tests (again,
   pre-dating your series). I.e. instead of asserting "we do X when Y is
   true" have the test assert all of Y=[unset|false|true] in a way that
   breaks if any are different than expected.

   One of the tests you're changing has a "graph_git_two_modes" helper
   does 2/3 of that, maybe it would be easy to extend it?

 * You use corrupting the graph as a shorthand for "did we run this
   command?". See test_region for an approach that might be better,
   i.e. just log with trace2 and grep that to see if we started the
   relevant command. The test_region helper can't be used as-is for that
   (you could manually grep the emitted PERF/EVENT output), but perhaps
   that's easier/more reliable?

 * This is again, something that pre-dates your patches, but I think
   following the "is enabled?" behavior here for these variables in "git
   fsck" is highly questionable. So just some thoughts after having
   looked at that:

   Just because your config doen't say "use the graph" doesn't mean
   another process won't do so, and if the graph is broken they might
   probably encounter an error.

   Although I think we *should* have cleaned up those being fatal a
   while ago). See 43d35618055 (commit-graph write: don't die if the
   existing graph is corrupt, 2019-03-25).

   But in general for these auxiliary files (commit-graph, midix, *.rev
   etc.) I think there's a good argument to be made that fsck should
   *always* check them, and at most use the config to say something
   like:

       Hey, you've got a 100% broken commit-graph/midx/rev in your .git,
       looks like the core.XYZ enabling it is off *right now* though, so
       maybe we won't use it.  I hope you're feeling lucky...

   :)


1:  fd9a58e2c7b ! 1:  c6041f3b633 fsck: verify commit graph when implicitly enabled
    @@ Metadata
      ## Commit message ##
         fsck: verify commit graph when implicitly enabled
     
    -    the_repository->settings is the preferred way to get certain
    -    settings (such as "core.commitGraph") because it gets default values
    -    from prepare_repo_settings(). However, cmd_fsck() reads the config
    -    directly via git_config_get_bool(), which bypasses these default values.
    -    This causes fsck to ignore the commit graph if "core.commitgraph" is not
    -    explicitly set in the config. This worked fine until commit-graph was
    -    enabled by default in 31b1de6a09 (commit-graph: turn on commit-graph by
    -    default, 2019-08-13).
    +    Change fsck to check the "core_commit_graph" variable set in
    +    "repo-settings.c" instead of reading the "core.commitGraph" variable,
    +    this fixes a bug where we wouldn't start "commit-graph verify" if the
    +    config key was missing, which since 31b1de6a09 (commit-graph: turn on
    +    commit-graph by default, 2019-08-13) has meant that it's on by
    +    default.
     
    -    Replace git_config_get_bool("core.commitgraph") in fsck with the
    -    equivalent call to the_repository->settings.core_commit_graph.
    -
    -    The expected behavior is that fsck respects the config value when it is
    -    set, but uses the default value when it is unset. For example, for
    -    core.commitGraph, there are three cases:
    -
    -    - Config value is set to true -> enabled
    -    - Config value is set to false -> disabled
    -    - Config value is unset -> enabled
    -
    -    As such, tests cover all three cases.
    +    Add tests to "t5318-commit-graph.sh" to check that this works, using
    +    the detection of a corrupt commit-graph as a method of seeing if
    +    "commit-graph verify" was invoked.
     
         Signed-off-by: Glen Choo <chooglen@xxxxxxxxxx>
         Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix)
2:  9d6813bc280 ! 2:  d11209ebfe1 fsck: verify multi-pack-index when implictly enabled
    @@ Commit message
     
         Signed-off-by: Glen Choo <chooglen@xxxxxxxxxx>
         Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: int cmd_fsck(int argc, const char **argv, const char *prefix)
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'verify incorrect offset' '
      		"incorrect object offset" \
     -		"git -c core.multipackindex=true fsck"
     +		"git -c core.multiPackIndex=true fsck" &&
    -+	test_unconfig core.multiPackIndex &&
     +	test_must_fail git fsck &&
     +	git -c core.multiPackIndex=false fsck
      '
3:  260f46568c6 ! 3:  017e2003e85 gc: perform incremental repack when implictly enabled
    @@ Commit message
     
         Signed-off-by: Glen Choo <chooglen@xxxxxxxxxx>
         Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
    +    Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
     
      ## builtin/gc.c ##
     @@ builtin/gc.c: static int maintenance_task_loose_objects(struct maintenance_run_opts *opts)
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +}
     +
     +test_expect_success 'maintenance.incremental-repack.auto' '
    -+	rm -rf incremental-repack-true &&
     +	git init incremental-repack-true &&
     +	(
     +		cd incremental-repack-true &&
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +'
     +
     +test_expect_success 'maintenance.incremental-repack.auto (when config is unset)' '
    -+	rm -rf incremental-repack-unset &&
     +	git init incremental-repack-unset &&
     +	(
     +		cd incremental-repack-unset &&
-:  ----------- > 4:  04d1527f180 WIP gc/maintenance: just call prepare_repo_settings() earlier

-- >8 --
Subject: [PATCH] WIP gc/maintenance: just call prepare_repo_settings() earlier

Consolidate the various calls to prepare_repo_settings() to happen at
the start of cmd_maintenance() (TODO: and cmd_gc()). This WIP commit
intentionally breaks things, we seem to be lacking test coverage for
cmd_gc(), perhaps since 31b1de6a09b (commit-graph: turn on
commit-graph by default, 2019-08-13)?

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx>
---
 builtin/gc.c            | 5 +----
 t/t5318-commit-graph.sh | 2 +-
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 26709311601..f59dbedc1fe 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -695,7 +695,6 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 		clean_pack_garbage();
 	}
 
-	prepare_repo_settings(the_repository);
 	if (the_repository->settings.gc_write_commit_graph == 1)
 		write_commit_graph_reachable(the_repository->objects->odb,
 					     !quiet && !daemonized ? COMMIT_GRAPH_WRITE_PROGRESS : 0,
@@ -860,7 +859,6 @@ static int run_write_commit_graph(struct maintenance_run_opts *opts)
 
 static int maintenance_task_commit_graph(struct maintenance_run_opts *opts)
 {
-	prepare_repo_settings(the_repository);
 	if (!the_repository->settings.core_commit_graph)
 		return 0;
 
@@ -1052,7 +1050,6 @@ static int incremental_repack_auto_condition(void)
 	int incremental_repack_auto_limit = 10;
 	int count = 0;
 
-	prepare_repo_settings(the_repository);
 	if (!the_repository->settings.core_multi_pack_index)
 		return 0;
 
@@ -1167,7 +1164,6 @@ static int multi_pack_index_repack(struct maintenance_run_opts *opts)
 
 static int maintenance_task_incremental_repack(struct maintenance_run_opts *opts)
 {
-	prepare_repo_settings(the_repository);
 	if (!the_repository->settings.core_multi_pack_index) {
 		warning(_("skipping incremental-repack task because core.multiPackIndex is disabled"));
 		return 0;
@@ -2492,6 +2488,7 @@ int cmd_maintenance(int argc, const char **argv, const char *prefix)
 	    (argc == 2 && !strcmp(argv[1], "-h")))
 		usage(builtin_maintenance_usage);
 
+	prepare_repo_settings(the_repository);
 	if (!strcmp(argv[1], "run"))
 		return maintenance_run(argc - 1, argv + 1, prefix);
 	if (!strcmp(argv[1], "start"))
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 4e4450af1ef..98123191e1e 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -368,7 +368,7 @@ test_expect_success 'check that gc computes commit-graph' '
 	git commit-graph write --reachable &&
 	cp $objdir/info/commit-graph commit-graph-before-gc &&
 	git reset --hard HEAD~1 &&
-	git config gc.writeCommitGraph true &&
+	# BUG: Due to 31b1de6a09b (commit-graph: turn on commit-graph by default, 2019-08-13)?
 	git gc &&
 	cp $objdir/info/commit-graph commit-graph-after-gc &&
 	! test_cmp_bin commit-graph-before-gc commit-graph-after-gc &&
-- 
2.33.0.1441.gbbcdb4c3c66









[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux