Re: [PATCH v4 0/4] Optionally skip hashing index on write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 16 2022, Derrick Stolee via GitGitGadget wrote:

Thanks for the update!

> Updates since v3
> ================
>
>  * Patch 1 uses an "int" instead of "unsigned int". This was a carry-over
>    from Kevin's patch in microsoft/git, but our use of it in patch 2 is
>    different and thus is better with a signed int.
>  * Patch 2 uses a fallback to the_repository if istate->repo is NULL. This
>    allows the setting to be applied to more cases where istate->repo is not
>    set.
>  * Patch 2 also uses 'start' in more places, since it already computed the
>    beginning of the hash.
>  * Patch 4 slightly modifies its use of prepare_repo_settings() due to Patch
>    2's fallback to the_repository.

It's good that it's "int" now instead of "unsigned int", but just doing
that search-replacement I think misses the implied (which I really
should have made more explicit) question in my v3 feedback
(https://lore.kernel.org/git/221215.861qp03agm.gmgdl@xxxxxxxxxxxxxxxxxxx/):
What should we do about the -1 state then?

With your 2/4 here we'll accept e.g.:

	./git -c index.skipHash=12345 status

As well as:

	./git -c index.skipHash=blahblah status

But with the migration to repo-settings.c we start refusing the latter
of those, as you inherit a stricture in repo-settings.c.

Aside: I think this series would be much more readable if it were just
squashed into one patch. 1/4 introduces code, but no way to reach it,
with 2/4 we have config, 3/4 adds a test 4/4 tweaks how to
read/parse/interpret/combine the config.

But as it is split up the individual steps should make sense. The 2/4
here should really just use "bool", not "maybe_bool" to begin with, no?

And part of this in 4/4 is inheriting a non-stricture in
repo-settings.c, but for this new config variable that we're introducing
as only a boolean from day one can't we just die() on anything that's
not a boolean?

On other implied feedback, in
https://lore.kernel.org/git/221215.865yec3b1j.gmgdl@xxxxxxxxxxxxxxxxxxx/
I noted the switching between istate->repo & the_repository, and that
you could hit a BUG() (when uncommenting a line in my testing patch on
top) if we didn't have istate->repo:

There was a commit message update for that:

>  2:  00738c81a12 ! 2:  aae047cbc9f read-cache: add index.skipHash config option
>      @@ Commit message
>       
>           [1] https://github.com/microsoft/git/commit/21fed2d91410f45d85279467f21d717a2db45201
>       
>      +    We load this config from the repository config given by istate->repo,
>      +    with a fallback to the_repository if it is not set.

But I think that really sweeps a potential issue under the rug. What are
these cases where we don't get istate->repo, are those expected? Is
preferring istate->repo always known to be redundant now, and just done
for good measure, or are there subtle cases (e.g. when reading
submodules) where we pick the wrong repo's config?

In that context I'd really like to see some testing of submodules in the
added tests, i.e. does this act as we'd like it to when you set the
config as "true" in the parent, but "false" in the submodule, & the
other way around? That's a case that should stress this "the_repository"
v.s. "istate->repo".

>           While older Git versions will not recognize the null hash as a special
>           case, the file format itself is still being met in terms of its
>           structure. Using this null hash will still allow Git operations to
>      @@ read-cache.c: static int verify_hdr(const struct cache_header *hdr, unsigned lon
>        	the_hash_algo->update_fn(&c, hdr, size - the_hash_algo->rawsz);
>        	the_hash_algo->final_fn(hash, &c);
>       -	if (!hasheq(hash, (unsigned char *)hdr + size - the_hash_algo->rawsz))
>      -+	if (!hasheq(hash, end - the_hash_algo->rawsz))
>      ++	if (!hasheq(hash, start))
>        		return error(_("bad index file sha1 signature"));
>        	return 0;

Thanks, good to have this suggested simplification.

>       @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempfile *tempfile,
>      + 	int ieot_entries = 1;
>      + 	struct index_entry_offset_table *ieot = NULL;
>      + 	int nr, nr_threads;
>      ++	struct repository *r = istate->repo ? istate->repo : the_repository;
>        
>        	f = hashfd(tempfile->fd, tempfile->filename.buf);
>        
>      -+	git_config_get_maybe_bool("index.skiphash", (int *)&f->skip_hash);
>      ++	repo_config_get_maybe_bool(r, "index.skiphash", &f->skip_hash);
>       +
>        	for (i = removed = extended = 0; i < entries; i++) {

Re the above this looks good, but did we introduce an untested behavior
change here or not?

>        		if (cache[i]->ce_flags & CE_REMOVE)
>  3:  86370af1351 = 3:  27fbe52e748 test-lib-functions: add helper for trailing hash
>  4:  6490bd445eb ! 4:  075921514f2 features: feature.manyFiles implies fast index writes
>      @@ read-cache.c: static int do_write_index(struct index_state *istate, struct tempf
>        
>        	f = hashfd(tempfile->fd, tempfile->filename.buf);
>        
>      --	git_config_get_maybe_bool("index.skiphash", (int *)&f->skip_hash);
>      -+	if (istate->repo) {
>      -+		prepare_repo_settings(istate->repo);
>      -+		f->skip_hash = istate->repo->settings.index_skip_hash;
>      -+	}
>      +-	repo_config_get_maybe_bool(r, "index.skiphash", &f->skip_hash);
>      ++	prepare_repo_settings(r);
>      ++	f->skip_hash = r->settings.index_skip_hash;
>        
>        	for (i = removed = extended = 0; i < entries; i++) {
>        		if (cache[i]->ce_flags & CE_REMOVE)

I really don't care per-se where we read the config, as long as it's
doing what we expect from the UX point of view.

But in your
https://lore.kernel.org/git/9e754af8-ecd3-6aed-60e8-2fc09a6a8759@xxxxxxxxxx/
you noted "There's no reason to load the config inside repo-settings.c
unless it's part of something like feature.manyFiles.".

I think that's true, but on the flip side of that: Is there a reason to
move the reading of such localized config (only needed in read-cache.c,
as 2/4 shows) to repo-settings.c, just because it's now relying on the
"manyfiles"?

I think the unstated reason for that is that while we read the
"manyfiles" config we didn't stick it in the "struct repo_settings",
therefore the reading of it is conveniently done in repo-settings.c,
which the 4/4 here does.

But I think it makes more sense to just stick the "manyfiles" in that
struct, not stick "skip_hash" there, and then just check the "manyfiles"
in 4/4, but read "index.skiphash" locally.

Maybe not worth the churn at this point, but I do wonder if we're going
in the wrong direction here, i.e. if all config that happens to rely on
features must be added to repo-settings.c, or whether we should reserve
that for truly "global" (or mostly global) config.





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux