Re: [PATCH] multi-pack-index: fix --object-dir from outside repo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 20, 2021 at 09:35:04PM +0200, Johannes Berg wrote:
> If using --object-dir to point into a repo, 'write' will
> segfault trying to access the object-dir via the repo it
> found, but that's not fully initialized. Fix it to use
> the object_dir properly.

Thanks for CC'ing me; I have definitely been wondering about the
intended behavior of `--object-dir` on the list recently [1].

I think your patch message could use some clarifying, though.  Invoking

    cd $REPO/..
    git multi-pack-index write --object-dir=$REPO/.git/objects

has... different behavior depending on which side of the "write"
argument you put `--object-dir". On the left-hand side (i.e.,
"--object-dir=... write", you get something like:

    cd $REPO/..
    git multi-pack-index --object-dir=$REPO/.git/objects write

    zsh: segmentation fault  git.compile multi-pack-index ...

because the_repository->objects->odb isn't initialized (so reading
`path` in `clear_midx_files_ext` crashes). But in the opposite order
(i.e., "write --object-dir=...") you get:

    BUG: environment.c:280: git environment hasn't been setup
    zsh: abort      git.compile multi-pack-index write

because we catch that case much earlier in get_object_directory(). Why?
Because cmd_multi_pack_index() fills in the value of object_dir with
get_object_directory() if it isn't filled in already, but seeing "write"
causes us to stop parsing and dispatch to the sub-command
cmd_multi_pack_index_write().

I discussed this a little in [1] also (see the part about using
RUN_SETUP instead). There are definitely different ways to handle that;
you could equally imagine only dying if we were both outside of a Git
repository and didn't point at one via `--object-dir`.

But that's separate from another issue which is fixed by your patch
which is that we don't respect the value of `--object-dir` when cleaning
up MIDX .rev files via clear_midx_files_ext().

Your fix there (to use the path of an object_dir instead of a repository
struct) makes sense (since we don't ever fill in a repository struct
corresponding to the `--object-dir` parameter from the MIDX code).

But I think that's a separate issue than the RUN_SETUP thing I mentioned
earlier, so I would probably consider breaking this into two patches,
the first which addresses the RUN_SETUP thing, and the second which is
this fix.

>  static int midx_checksum_valid(struct multi_pack_index *m)
> @@ -1086,7 +1086,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>
>  	if (flags & MIDX_WRITE_REV_INDEX)
>  		write_midx_reverse_index(midx_name, midx_hash, &ctx);
> -	clear_midx_files_ext(the_repository, ".rev", midx_hash);
> +	clear_midx_files_ext(object_dir, ".rev", midx_hash);

We can rely on this value always being non-NULL, so this is good.

> -static void clear_midx_files_ext(struct repository *r, const char *ext,
> +static void clear_midx_files_ext(const char *object_dir, const char *ext,
>  				 unsigned char *keep_hash)
>  {
>  	struct clear_midx_data data;
> @@ -1146,7 +1146,7 @@ static void clear_midx_files_ext(struct repository *r, const char *ext,
>  				    hash_to_hex(keep_hash), ext);
>  	data.ext = ext;
>
> -	for_each_file_in_pack_dir(r->objects->odb->path,
> +	for_each_file_in_pack_dir(object_dir,
>  				  clear_midx_file_ext,
>  				  &data);

And here's the most important part of the change, which is obviously
correct. But note to other reviewers that this has nothing to do with
the RUN_SETUP issue I mentioned earlier, since
for_each_file_in_pack_dir() doesn't care about that.

> +test_expect_success 'multi-pack-index with --object-dir need not be in repo' '
> +	p="$(pwd)" &&
> +	rm -f $objdir/multi-pack-index &&
> +	cd / &&
> +	git multi-pack-index --object-dir="$p/$objdir" write &&
> +	cd "$p"
> +'
> +

I agree with Stolee that there should be a new repo created within the
current working directory, that way you can "cd .." and be both outside
of the repo you just created, but not outside of the test environment.

But let's make sure that we're not deleting any files that we should be
leaving alone. So it might be good to do something like:

    git init repo &&
    test_when_finished "rm -fr repo" &&
    (
      cd repo &&

      test_commit base &&
      git repack -d &&
    ) &&

    rev="$objdir/pack/multi-pack-index-$(midx_checksum $objdir).rev" &&
    touch $rev &&

    git multi-pack-index write --object-dir=repo/.git/objects &&

    test_path_is_file repo/.git/objects/pack/multi-pack-index &&
    test_path_is_file repo/.git/objects/multi-pack-index &&
    test_path_is_file $objdir/pack/multi-pack-index &&
    test_path_is_file $rev

That isn't testing the "invoked from a non-repo, but --object-dir" is
given case, but I think that's fine since they really are separate
things.

Note also that midx_checksum doesn't exist, but it is merely a wrapper
over a test-tool that prints out (for a multi_pack_index "m") `m->data +
m->data_len - the_hash_algo->rawsz`.

So between splitting the patch, clarifying the patch message, and
implementing support for this new test helper, this may be more of a
project than you were bargaining for ;). Let me know if you want any
help. I also don't mind taking care of it myself, since I promised in
[1] that I'd fix this issue anyway.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/YQMFIljXl7sAAA%2FL@nand.local/



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux