Re: [PATCH v2 3/8] builtin/index-pack.c: write reverse indexes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 22, 2021 at 06:53:11PM -0500, Jeff King wrote:
> On Wed, Jan 13, 2021 at 05:28:15PM -0500, Taylor Blau wrote:
>
> >  OPTIONS
> > @@ -33,7 +34,14 @@ OPTIONS
> >  	file is constructed from the name of packed archive
> >  	file by replacing .pack with .idx (and the program
> >  	fails if the name of packed archive does not end
> > -	with .pack).
> > +	with .pack). Incompatible with `--rev-index`.
>
> I wondered which option was incompatible, but couldn't see from the
> context. It is "index-pack -o", which kind of makes sense. We can derive
> "foo.rev" from "foo.idx", but normally "-o" does not do any deriving.
>
> So I was all set to say "OK, we can live without it", but...
>
> > @@ -1824,7 +1851,16 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix)
> >  	if (from_stdin && hash_algo)
> >  		die(_("--object-format cannot be used with --stdin"));
> >  	if (!index_name && pack_name)
> > -		index_name = derive_filename(pack_name, "idx", &index_name_buf);
> > +		index_name = derive_filename(pack_name, ".pack", "idx", &index_name_buf);
> > +
> > +	opts.flags &= ~(WRITE_REV | WRITE_REV_VERIFY);
> > +	if (rev_index) {
> > +		opts.flags |= verify ? WRITE_REV_VERIFY : WRITE_REV;
> > +		if (index_name)
> > +			rev_index_name = derive_filename(index_name,
> > +							 ".idx", "rev",
> > +							 &rev_index_name_buf);
> > +	}
>
> ...here we do end up deriving ".rev" from ".idx" anyway. So I guess we
> probably could support "-o".  I also wonder what happens with "git
> index-pack -o foo.idx" when pack.writeReverseIndex is set. It looks like
> it would just work because of this block. But then shouldn't
> "--rev-index" work, too? And indeed, there is a test for that at the end
> of the patch! So is the documentation just wrong?

Hah! The documentation is just plain wrong. It's been a while, but I
have a vague recollection of writing this documentation before changing
the implementation of index-pack to allow this. Clearly, I forgot to go
back to update the broken documentation.

Hilariously, there is even a test in t5325 that demonstrates this
working! 'git index-pack --rev-index -o other.idx' writes both
'other.idx' and 'other.rev'. That was easy :-).

> I admit to finding the use of opts.flags versus the rev_index option a
> bit confusing. It seems like they are doing roughly the same thing, but
> influenced by different sources. It seems like we should be able to have
> a single local variable (then that goes on to set opts.flags for any
> sub-functions we call). Or maybe two, if we need to distinguish config
> versus command-line, but then they should have clear names
> (rev_index_config and rev_index_cmdline or something).

Yeah, I know. It's because we already pass a pointer to a struct
pack_idx_option to git_index_pack_config(), so in effect the 'flags' on
that struct *is* rev_index_config.

It's a little ugly, I agree, but I'm skeptical that the effort to clean
it up is worth it, mostly because the pack_idx_option struct probably
shouldn't be part of the index-pack builtin in the first place.

> As an aside, looking at derive_filename(), it seems a bit weird that one
> argument has a dot in the suffix and the other does not. I guess you are
> following the convention from write_special_file(), which omits it in
> the newly-added suffix. But it is slightly awkward to omit it for the
> old suffix in derive_filename(), because we want to strip_suffix() it
> all at once.

> Probably not that big a deal, but if anybody feels strongly, then
> derive_filename() could do:
>
>   if (!strip_suffix(pack_name, strip, &len) ||
>       !len || pack_name[len] != '.')
> 	die("does not end in .%s", strip);

That does make the callers look nicer, but it needs an extra two things:

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index ef2874a8e6..c758f3b8e9 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1441,11 +1441,10 @@ static const char *derive_filename(const char *pack_name, const char *strip,
 {
        size_t len;
        if (!strip_suffix(pack_name, strip, &len) || !len ||
-           pack_name[len] != '.')
+           pack_name[len - 1] != '.')
                die(_("packfile name '%s' does not end with '.%s'"),
                    pack_name, strip);
        strbuf_add(buf, pack_name, len);
-       strbuf_addch(buf, '.');
        strbuf_addstr(buf, suffix);
        return buf->buf;
 }

And then it does what you are looking for. I'll pull that change out
with your Suggested-by as a preparatory commit right before this one.

> > @@ -1578,6 +1591,12 @@ static int git_index_pack_config(const char *k, const char *v, void *cb)
> >  		}
> >  		return 0;
> >  	}
> > +	if (!strcmp(k, "pack.writereverseindex")) {
> > +		if (git_config_bool(k, v))
> > +			opts->flags |= WRITE_REV;
> > +		else
> > +			opts->flags &= ~WRITE_REV;
> > +	}
> >  	return git_default_config(k, v, cb);
> >  }
>
> IMHO we'll eventually want to turn this feature on by default. In which
> case we'll have to update every caller which is checking the config
> manually. Should we hide this in a function that looks up the config,
> and sets the default? Or alternatively, I guess, they could all use some
> shared initializer for "flags".

Note that there are only two such callers, so I'm not sure the effort to
extract this would be worth it.

> > +	# Intentionally corrupt the reverse index.
> > +	chmod u+w $rev &&
> > +	printf "xxxx" | dd of=$rev bs=1 count=4 conv=notrunc &&
> > +
> > +	test_must_fail git index-pack --rev-index --verify \
> > +		$packdir/pack-$pack.pack 2>err &&
> > +	grep "validation error" err
> > +'
>
> This isn't that subtle of a corruption, because we are corrupting the
> first 4 bytes, which is the magic signature. Maybe something further in
> the actual data would be interesting instead of or in addition?
>
> I dunno. There are a lot of edge cases around corruption (likewise, we
> might care how the normal reading code-path perceives a signature
> corruption like this). I'm not sure it's all that interesting to test
> all of them.

Agreed, I think what is here (even though it's not a severe corruption)
would be sufficient to make me feel good about our error handling in
general.

Thanks,
Taylor



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux