Glen Choo wrote: > In several parts of the setup machinery, we set up a repository_format > and then use it to set up the_repository in nearly the exact same way, > suggesting that we might be able to use a helper function to standardize > the behavior and make future modifications easier. Create this helper > function, setup_repository_from_format(), thus standardizing this > behavior. > > To determine what the 'standardized behavior' should be, we can compare > the candidate call sites in repo_init(), setup_git_directory_gently(), > check_repository_format() and discover_git_directory(), > > - All of them copy .worktree_config. > > - All of them 'copy' .partial_clone. Most perform a shallow copy of the > pointer, then set the .partial_clone = NULL so that it doesn't get > cleared by clear_repository_format(). However, > check_repository_format() copies the string deeply because the > repository_format is sometimes read back (it is an "out" parameter). > To accomodate both shallow copying and deep copying, toggle this > behavior using the "modify_fmt_ok" parameter. Do you have a specific example of this happening? I see two uses of 'check_repository_format()' in the codebase: 1. in 'enter_repo()' ('path.c') 2. in 'init_db()' ('init-db.c') The first one calls 'check_repository_format()' with 'NULL', which causes the function to create a temporary 'struct repository_format' that is then discarded at the end of the function - no need to worry about the value being cleared there. The second one does call 'check_repository_format()' with a 'struct repository_format' instance, but the 'partial_clone' field field is not accessed again after that. The only subsequent usages of the 'repo_fmt' variable in 'init_db()' are: - in 'validate_hash_algorithm()', where only the 'version' and 'hash_algo' fields are accessed. - in 'create_default_files()', where only 'hash_algo' is accessed. So, shouldn't it be safe to shallow-copy-and-NULL? But as I noted earlier [1], if you do that it'll make the name 'check_repository_format()' a bit misleading (since it's actually modifying its arg in place). So, if you update to always shallow copy, 'check_repository_format()' should be renamed to reflect its side effects. [1] https://lore.kernel.org/git/49509708-c0a1-2439-a551-cab05d944b66@xxxxxxxxxx/ > > - Most of them set up repository.hash_algo, except > discover_git_directory(). Our helper function unconditionally sets up > .hash_algo because it turns out that discover_git_directory() probably > doesn't need to set up "struct repository" at all! If that's the case, shouldn't the 'repository_format' assignments in 'discover_git_directory()' be removed altogether? > discover_git_directory() isn't actually used in the setup process - its > only caller in the Git binary is read_early_config(). As explained by > 16ac8b8db6 (setup: introduce the discover_git_directory() function, > 2017-03-13), it is supposed to be an entrypoint into setup.c machinery > that allows the Git directory to be discovered without side effects, > in other words, we shouldn't have introduced side effects in > ebaf3bcf1ae (repository: move global r_f_p_c to repo struct, > 2021-06-17). Fortunately, we didn't start to rely on this unintended > behavior between then and now, so we can just drop it. > > Signed-off-by: Glen Choo <chooglen@xxxxxxxxxx> > --- > Here's the helper function I had in mind. I was initially mistaken and > it turns out that we need to support deep copying, but fortunately, > t0001 is extremely thorough and will catch virtually any mistake in the > setup process. CI seems to pass, though it appears to be a little flaky > today and sometimes cancels jobs > (https://github.com/chooglen/git/actions/runs/5249029150). > > If you're comfortable with it, I would prefer for you to squash this > into your patches so that we don't just end up changing the same few > lines. If not, I'll Reviewed-by your patches (if I don't find any other > concerns on a re-read) and send this as a 1-patch on top. Reading through the commit message & patch, I'm still not convinced this refactor is a good idea - to me, it doesn't leave the code in a clearly better state. If you feel strongly that it does, though, I'm happy to leave it to others to review/decide but I would prefer that you keep it a separate patch submission on top. Thanks! > diff --git a/repository.c b/repository.c > index 104960f8f5..50f0b26a6c 100644 > --- a/repository.c > +++ b/repository.c > @@ -181,12 +181,7 @@ int repo_init(struct repository *repo, > if (read_and_verify_repository_format(&format, repo->commondir)) > goto error; > > - repo_set_hash_algo(repo, format.hash_algo); > - repo->repository_format_worktree_config = format.worktree_config; > - > - /* take ownership of format.partial_clone */ > - repo->repository_format_partial_clone = format.partial_clone; > - format.partial_clone = NULL; > + setup_repository_from_format(repo, &format, 1); > > if (worktree) > repo_set_worktree(repo, worktree); > diff --git a/setup.c b/setup.c > index d866395435..33ce58676f 100644 > --- a/setup.c > +++ b/setup.c > @@ -1561,13 +1561,8 @@ const char *setup_git_directory_gently(int *nongit_ok) > setup_git_env(gitdir); > } > if (startup_info->have_repository) { > - repo_set_hash_algo(the_repository, repo_fmt.hash_algo); > - the_repository->repository_format_worktree_config = > - repo_fmt.worktree_config; > - /* take ownership of repo_fmt.partial_clone */ > - the_repository->repository_format_partial_clone = > - repo_fmt.partial_clone; > - repo_fmt.partial_clone = NULL; > + setup_repository_from_format(the_repository, > + &repo_fmt, 1); > } > } > /* > @@ -1654,14 +1649,26 @@ void check_repository_format(struct repository_format *fmt) > fmt = &repo_fmt; > check_repository_format_gently(get_git_dir(), fmt, NULL); > startup_info->have_repository = 1; > - repo_set_hash_algo(the_repository, fmt->hash_algo); > - the_repository->repository_format_worktree_config = > - fmt->worktree_config; > - the_repository->repository_format_partial_clone = > - xstrdup_or_null(fmt->partial_clone); > + setup_repository_from_format(the_repository, fmt, 0); > clear_repository_format(&repo_fmt); > } > I think you may be missing changes to 'discover_git_directory()'? Like I mentioned above, though, if you don't think 'discover_git_directory()' needs to set up 'the_repository', then those assignments should just be removed (not replaced with 'setup_repository_from_format()').