A change recently merged to 'next' stops us from defaulting to using SHA-1 unless other code (like a logic early in the start-up sequence to see what hash is being used in the repository we are working in) explicitly sets it, leading to a (deliberate) crash of "git" when we forgot to cover certain code paths. It turns out we have a few. Notable ones are all operations that are designed to work outside a repository. We should go over all such code paths and give them a reasonable default when there is one available (e.g. for historical reasons, patch-id is documented to work with SHA-1 hashes, so arguably it should do so everywhere, not just in SHA-1 repositories, but outside any repository), and when there is none, we can use the existing GIT_DEFAULT_HASH environment variable to let the user control the choice. In this fourth iteration, I used the existing GIT_DEFAULT_HASH environment variable that has been used to allow users to opt into an early experiment of SHA-256 support. I saw breakages in SHA-256 CI jobs for earlier rounds of this series, but hopefully this is now good to go. Knock wood... Junio C Hamano (3): setup: add an escape hatch for "no more default hash algorithm" change t1517: test commands that are designed to be run outside repository apply: fix uninitialized hash function Patrick Steinhardt (2): builtin/patch-id: fix uninitialized hash function builtin/hash-object: fix uninitialized hash function builtin/apply.c | 4 +++ builtin/hash-object.c | 3 ++ builtin/patch-id.c | 13 +++++++++ environment.h | 1 + repository.c | 40 +++++++++++++++++++++++++++ setup.c | 2 -- t/t1007-hash-object.sh | 6 ++++ t/t1517-outside-repo.sh | 61 +++++++++++++++++++++++++++++++++++++++++ t/t4204-patch-id.sh | 34 +++++++++++++++++++++++ 9 files changed, 162 insertions(+), 2 deletions(-) create mode 100755 t/t1517-outside-repo.sh Range-diff against v3: 1: 7adbf0cc5f ! 1: 3038f73e1c setup: add an escape hatch for "no more default hash algorithm" change @@ Commit message Partially revert c8aed5e8 (repository: stop setting SHA1 as the default object hash, 2024-05-07), to keep end-user systems still broken when we have gap in our test coverage but yet give them an - escape hatch to set the GIT_TEST_DEFAULT_HASH_ALGORITHM environment - variable to "sha1" in order to revert to the previous behaviour. + escape hatch to set the GIT_DEFAULT_HASH environment variable to + "sha1" in order to revert to the previous behaviour. This variable + has been in use for using SHA-256 hash by default, and it should be + a better fit than inventing a new and test-only knob. Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> + ## environment.h ## +@@ environment.h: const char *getenv_safe(struct strvec *argv, const char *name); + #define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS" + #define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR" + #define GIT_ATTR_SOURCE_ENVIRONMENT "GIT_ATTR_SOURCE" ++#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH" + + /* + * Environment variable used in handshaking the wire protocol. + ## repository.c ## +@@ + #include "git-compat-util.h" + #include "abspath.h" ++#include "environment.h" + #include "repository.h" + #include "object-store-ll.h" + #include "config.h" @@ static struct repository the_repo; struct repository *the_repository = &the_repo; @@ repository.c + const char *hash_name; + int algo; + -+ hash_name = getenv("GIT_DEFAULT_HASH_ALGORITHM"); ++ hash_name = getenv(GIT_DEFAULT_HASH_ENVIRONMENT); + if (!hash_name) + return; + algo = hash_algo_by_name(hash_name); ++ ++ /* ++ * NEEDSWORK: after all, falling back to SHA-1 by assigning ++ * GIT_HASH_SHA1 to algo here, instead of returning, may give ++ * us better behaviour. ++ */ + if (algo == GIT_HASH_UNKNOWN) + return; ++ + repo_set_hash_algo(repo, algo); +} + @@ repository.c: void initialize_repository(struct repository *repo) + * giving them an appropriate default individually; any + * unconverted codepath that tries to access the_hash_algo + * will thus fail. The end-users however have an escape hatch -+ * to set GIT_DEFAULT_HASH_ALGORITHM environment variable to -+ * "sha1" get back the old behaviour of defaulting to SHA-1. ++ * to set GIT_DEFAULT_HASH environment variable to "sha1" get ++ * back the old behaviour of defaulting to SHA-1. + */ + if (repo == the_repository) + set_default_hash_algo(repo); } static void expand_base_dir(char **out, const char *in, + + ## setup.c ## +@@ setup.c: int daemonize(void) + #define TEST_FILEMODE 1 + #endif + +-#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH" +- + static void copy_templates_1(struct strbuf *path, struct strbuf *template_path, + DIR *dir) + { 2: c40b2bc7b5 ! 2: e56ae68961 t1517: test commands that are designed to be run outside repository @@ t/t1517-outside-repo.sh (new) +TEST_PASSES_SANITIZE_LEAK=true +. ./test-lib.sh + -+# Do not use the end-user workaround for this test. -+# GIT_DEFAULT_HASH_ALGORITHM=sha1; export GIT_DEFAULT_HASH_ALGORITHM -+ +test_expect_success 'set up a non-repo directory and test file' ' + GIT_CEILING_DIRECTORIES=$(pwd) && + export GIT_CEILING_DIRECTORIES && 3: 48d83fe1fd ! 3: e1ae0e95fc builtin/patch-id: fix uninitialized hash function @@ builtin/patch-id.c: int cmd_patch_id(int argc, const char **argv, const char *pr + * NEEDSWORK: This hack should be removed in favor of converting + * the code that computes patch IDs to always use SHA1. + */ -+ if (!startup_info->have_repository) ++ if (!the_hash_algo) + repo_set_hash_algo(the_repository, GIT_HASH_SHA1); + generate_id_list(opts ? opts > 1 : config.stable, 4: a5bf0dfb0a ! 4: 1c8ce0d024 builtin/hash-object: fix uninitialized hash function @@ Commit message SHA1 as the default object hash, 2024-05-07), this will make us hit an uninitialized hash function, which subsequently leads to a segfault. - Fix this by falling back to SHA-1 explicitly when running outside of a - Git repository. Users can use GIT_DEFAULT_HASH_ALGORITHM environment to - specify what hash algorithm they want, so arguably this code should not - be needed. + Fix this by falling back to SHA-1 explicitly when running outside of + a Git repository. Users can use GIT_DEFAULT_HASH environment to + specify what hash algorithm they want, so arguably this code should + not be needed. Signed-off-by: Patrick Steinhardt <ps@xxxxxx> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> 5: 2b9e486c3f = 5: 5b0ec43757 apply: fix uninitialized hash function