Re: [PATCH v4 2/4] sparse-index: avoid unnecessary cache tree clearing

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 29 Oct 2021 11:45:52 -0700

"Victoria Dye via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Victoria Dye <vdye@xxxxxxxxxx>
>
> When converting a full index to sparse, clear and recreate the cache tree
> only if the cache tree is not fully valid. The convert_to_sparse operation
> should exit silently if a cache tree update cannot be successfully completed
> (e.g., due to a conflicted entry state). However, because this failure
> scenario only occurs when at least a portion of the cache tree is invalid,
> we can save ourselves the cost of clearing and recreating the cache tree by
> skipping the check when the cache tree is fully valid.

I see in cache-tree.c::update_one() this snippet of code.

	/*
	 * If the first entry of this region is a sparse directory
	 * entry corresponding exactly to 'base', then this cache_tree
	 * struct is a "leaf" in the data structure, pointing to the
	 * tree OID specified in the entry.
	 */
	if (entries > 0) {
		const struct cache_entry *ce = cache[0];

		if (S_ISSPARSEDIR(ce->ce_mode) &&
		    ce->ce_namelen == baselen &&
		    !strncmp(ce->name, base, baselen)) {
			it->entry_count = 1;
			oidcpy(&it->oid, &ce->oid);
			return 1;
		}
	}

Sorry for not noticing it earlier, but does this mean that the
content of a cache-tree changes shape when sparse-ness of the index
changes?  Is a cache-tree that knows about all of the
subdirectories, even the ones that are descendants of a directory
that is represented as a tree-ish entry in the main index array,
still valid in a sparse index?

If not, then I do not think of a quick and sure way to ensure that
the cache-tree is valid when the sparse-ness changes.

The earlier suggestion was based on my assumption that even when the
main index array becomes sparse, the cache tree is still populated
and valid, so that after writing a tree and writing an on-disk index,
and then reading the on-disk index back (possibly in another process),
would not have to incur the recomputation cost of the full tree when
the reading codepath needs to flip the sparseness.

But the above code snippet makes me worried a lot.  A cache-tree
that used to be valid when the corresponding in-core index array was
not sparse will become invalid immediately when we decide to make it
sparse, right?