Re: ds/omit-trailing-hash-in-index (was: What's cooking in git.git (Dec 2022, #06; Sun, 18))

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 20 2022, Derrick Stolee wrote:

> On 12/19/22 5:49 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Sun, Dec 18 2022, Junio C Hamano wrote:
>> 
>>> * ds/omit-trailing-hash-in-index (2022-12-17) 4 commits
>>>  - features: feature.manyFiles implies fast index writes
>>>  - test-lib-functions: add helper for trailing hash
>>>  - read-cache: add index.skipHash config option
>>>  - hashfile: allow skipping the hash function
>>>
>>>  Introduce an optional configuration to allow the trailing hash that
>>>  protects the index file from bit flipping.
>>>
>>>  Will merge to 'next'?
>>>  source: <pull.1439.v4.git.1671204678.gitgitgadget@xxxxxxxxx>
>> 
>> I've been following this closely & reviewing it. I think the end-state
>> is probably good, but noted in [1] that the intermediate progression
>> equates bad config with "true", so:
>> 
>> 	git -c index.skipHash=blahblah status
>> 
>> Enables it, fixing that is trivial, and probably worth a re-roll.
>
> Fixing it is trivial, but as you say it is correct in the final
> version so I don't think this triviality is worth a re-roll.

I won't insist on it, but I don't think changing behavior in commits
that don't note that the behavior is changing is "triviality". The
inter-state is conflating errors with "was set".

It also suggests that we have a test blind-spot, although in the
end-state that's probably not worth chasing, as we know it's using the
more trusted repo-settings.c code.

I noted in the feedback that I think this would be much easier to read,
review & reason about if it were squashed, as we really only care about
the end-state.

But if you're advocating for adding it piecemeal let's get that right,
and not add a bug in an earlier commit & then remove it in a later
rewrite/refactoring.

>> The "probably" above is then because the patches seemingly try to make
>> this compatible with different config for submodules, but there's no
>> tests for submodule interaction, so that may or may not work.
>> 
>> Normally we could just trust the "struct repository *" parameter we get,
>> but in this case it's "istate->repo", which (as I showed in the v3
>> feedback[2]) is sometimes NULL.
>
> This, and other feedback you've given around 'struct repository *' values
> makes me think you are not aware of the current state of the submodule
> work.

For better or worse I'm *probably* one of the people most up-to-date on
it, see [1] (the --super-prefix patches currently queued are prep
patches for that).

> I'm also a bit out-of-date, but my understanding is that the
> conversion to stop using the_repository everywhere is only halfway
> complete, which makes it difficult to properly test things with multiple
> 'struct repository *' pointers. However, the best thing we can do is to
> use whatever local repository we have, so we don't contribute further to
> the problem.

...

> The fact that istate->repo is sometimes NULL is a separate issue, but is
> generally unrelated to the subject at hand and should be fixed by another
> topic, not used to block this one.

I noted as an aside that istate->repo is sometimes NULL when it
shouldn't be, and that sucks.

But the comment on your topic is that you are adding new behavior that
relies on the specifics of what "repo" we have in play, so shouldn't we
add tests for those interactions? Or if they're not needed let's note
that and why.

You can tease out those cases with e.g.:
	
	diff --git a/read-cache.c b/read-cache.c
	index cf87ad70970..dc16167ff35 100644
	--- a/read-cache.c
	+++ b/read-cache.c
	@@ -2436,6 +2436,9 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist)
	 	if (!istate->repo)
	 		istate->repo = the_repository;
	 
	+	if (istate->repo && istate->repo != the_repository)
	+		BUG("read: got istate:%p and tr:%p", (void *)istate->repo, (void *)the_repository);
	+
	 	/*
	 	 * If the command explicitly requires a full index, force it
	 	 * to be full. Otherwise, correct the sparsity based on repository
	@@ -2930,6 +2933,9 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
	 	int nr, nr_threads;
	 	struct repository *r = istate->repo ? istate->repo : the_repository;
	 
	+	if (istate->repo && istate->repo != the_repository)
	+		BUG("write: got istate:%p and tr:%p", (void *)istate->repo, (void *)the_repository);
	+
	 	f = hashfd(tempfile->fd, tempfile->filename.buf);
	 
	 	prepare_repo_settings(r);

The former ("read") hunk will fail a lot of tests (e.g. "git grep
--recurse-submodules"), but I *think* that for "write" they're currently
always the same.

I was just prodding you to clarify that one way or the other, i.e. was
it just for good measure, or was it a change in behavior (in particular
before you re-rolled to have your earlier commit use the "istate->repo"
too).

Now, I *think* it's just for good measure, but then (and I didn't notice
this before) shouldn't the new code you added to do_write_index() to
prepare_repo_settings() also populate "istate->repo"? That's what we
already do in do_read_index(). I.e.:

	diff --git a/read-cache.c b/read-cache.c
	index cf87ad70970..1a622a52036 100644
	--- a/read-cache.c
	+++ b/read-cache.c
	@@ -2928,12 +2928,14 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
	 	int ieot_entries = 1;
	 	struct index_entry_offset_table *ieot = NULL;
	 	int nr, nr_threads;
	-	struct repository *r = istate->repo ? istate->repo : the_repository;
	+
	+	if (!istate->repo)
	+		istate->repo = the_repository;
	 
	 	f = hashfd(tempfile->fd, tempfile->filename.buf);
	 
	-	prepare_repo_settings(r);
	-	f->skip_hash = r->settings.index_skip_hash;
	+	prepare_repo_settings(istate->repo);
	+	f->skip_hash = istate->repo->settings.index_skip_hash;
	 
	 	for (i = removed = extended = 0; i < entries; i++) {
	 		if (cache[i]->ce_flags & CE_REMOVE)

1. https://lore.kernel.org/git/cover-00.10-00000000000-20221017T115544Z-avarab@xxxxxxxxx/




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux