Re: git 2.46.0 crashes when trying to verify-pack outside of a repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Sep 01, 2024 at 08:26:08AM -0700, Junio C Hamano wrote:
> Ilya K <me@xxxxxxxx> writes:
> 
> > We've updated to Git 2.46.0 in NixOS, and encountered an issue
> > with Dulwich (a Python Git implementation) tests failing[0]
> > because it attempts to call `git verify-pack` on a bare pack, with
> > no surrounding repo. This used to work in Git 2.45.x, but in 2.46
> > it simply prints "error: index-pack died of signal 11".
> 
> Thanks.  This is a fallout from code-wide clean-up in 2.46.0 where
> we do not assume that everybody runs SHA-1.

Yup, indeed. The problem lies deeper than what the below patch fixes
though. The issue isn't in git-verify-pack(1), but in git-index-pack(1),
and can be fixed like this:

    diff --git a/builtin/index-pack.c b/builtin/index-pack.c
    index fd968d673d2..e6edd96d099 100644
    --- a/builtin/index-pack.c
    +++ b/builtin/index-pack.c
    @@ -1733,7 +1733,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
        unsigned char pack_hash[GIT_MAX_RAWSZ];
        unsigned foreign_nr = 1;	/* zero is a "good" value, assume bad */
        int report_end_of_input = 0;
    -	int hash_algo = 0;
    +	int hash_algo = GIT_HASH_UNKNOWN;;
     
        /*
         * index-pack never needs to fetch missing objects except when
    @@ -1857,6 +1857,9 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
            pack_name = arg;
        }
     
    +	if (!the_repository->hash_algo && hash_algo == GIT_HASH_UNKNOWN)
    +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
    +
        if (!pack_name && !from_stdin)
            usage(index_pack_usage);
        if (fix_thin_pack && !from_stdin)

Unfortunately, this once again uncovers a deeper issue: neither the
packfile nor their index encode the object format they use. So while
falling back to SHA1 papers over the issue, it means that we misparse
SHA256 indices. Also, we misparse SHA1 indices if we happen to be in a
SHA256 repository. E.g. when parsing a SHA256 file in a SHA1 repo:

    $ git index-pack --verify '/tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.pack'
    error: wrong index v2 file size in /tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.idx
    fatal: Cannot open existing pack idx file for '/tmp/git-tests/trash directory.t5300-pack-object/repo/.git/objects/pack/pack-aa45f7f08f043c9f0388f1844a2a797587254e249919b35ac9dc2b52c1aada29.idx'

The error message isn't even properly indicating what the actual issue
is.

One potential solution would be to try and derive the object format from
the hash that the packfile index name has. But that is quite roundabout
and rather ugly, and packfiles may not necessarily have that hash in the
first place. It would also become potentially ambiguous in the future if
we were to ever adopt another hash that has the same length as either
SHA1 or SHA256.

So we basically have three different options:

  - Accept that we just don't handle this case correctly and let the
    code error out. This pessimizes all hashes but SHA256.

  - Bail out when outside of a repository when `--object-format=` wasn't
    given. This pessimizes all hashes, but gives a clear indicator to
    the user why things don't work.

  - Introduce packfiles v3 and encode the object format into the header.
    Then do either (1) or (2) on top.

The last option is of course the cleanest, but also the most involved.

Patrick

> ------- >8 -------
> Subject: verify-pack: fall back to SHA-1 outside a repo
> 
> In c8aed5e8da (repository: stop setting SHA1 as the default object hash,
> 2024-05-07), we have stopped setting the default hash algorithm for
> `the_repository`. Consequently, code that relies on `the_hash_algo` will
> now crash when it hasn't explicitly been initialized, which may be the
> case when running outside of a Git repository.
> 
> As the verify-pack command ought to be able to infer what algorithm
> is used in the input file (and if the input file does not have such
> an information, that by itself is a problem), and the command allows
> an option to explicitly tell what algorithm to use in case it cannot
> be guessed from the input file, in theory we shouldn't have to use
> the default algorithm anywhere in the operation of the command, but
> we fail fairly early in the process when run outside a repository
> without any default algorithm set.
> 
> Resurrect the setting of the default algorithm just like we used to
> do before 2.46.0
> 
> Reported-by: Ilya K <me@xxxxxxxx>
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> ---
>  builtin/verify-pack.c  | 4 ++++
>  t/t5300-pack-object.sh | 4 ++++
>  2 files changed, 8 insertions(+)
> 
> diff --git c/builtin/verify-pack.c w/builtin/verify-pack.c
> index 011dddd2dc..5b663905ae 100644
> --- c/builtin/verify-pack.c
> +++ w/builtin/verify-pack.c
> @@ -1,6 +1,7 @@
>  #include "builtin.h"
>  #include "config.h"
>  #include "gettext.h"
> +#include "hash.h"
>  #include "run-command.h"
>  #include "parse-options.h"
>  #include "strbuf.h"
> @@ -77,6 +78,9 @@ int cmd_verify_pack(int argc, const char **argv, const char *prefix)
>  		OPT_END()
>  	};
>  
> +	if (!the_hash_algo)
> +		repo_set_hash_algo(the_repository, GIT_HASH_SHA1);
> +
>  	git_config(git_default_config, NULL);
>  	argc = parse_options(argc, argv, prefix, verify_pack_options,
>  			     verify_pack_usage, 0);
> diff --git c/t/t5300-pack-object.sh w/t/t5300-pack-object.sh
> index 4ad023c846..d6f45d8923 100755
> --- c/t/t5300-pack-object.sh
> +++ w/t/t5300-pack-object.sh
> @@ -322,6 +322,10 @@ test_expect_success 'verify-pack catches a corrupted sum of the index file itsel
>  	fi
>  '
>  
> +test_expect_success 'verify-pack outside a repository' '
> +	nongit git verify-pack -v "$(pwd)/test-1-${packname_1}.idx"
> +'
> +
>  test_expect_success 'build pack index for an existing pack' '
>  	cat test-1-${packname_1}.pack >test-3.pack &&
>  	git index-pack -o tmp.idx test-3.pack &&




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux