Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config

Derrick Stolee <stolee@xxxxxxxxx> · Thu, 21 Oct 2021 09:26:19 -0400

On 10/18/2021 10:14 AM, Victoria Dye wrote:
> Derrick Stolee wrote:
>> On 10/17/2021 9:15 PM, Junio C Hamano wrote:
>>> OK.  I somehow got the impression that we convert in both ways from
>>> the patch text under discussion, specifically this part in
>>> do_read_index():
>>>
>>>> -	if (istate->repo->settings.command_requires_full_index)
>>>> +	if (!istate->repo->settings.sparse_index ||
>>>> +	    istate->repo->settings.command_requires_full_index)
>>>>  		ensure_full_index(istate);
>>>> +	else if (!istate->sparse_index)
>>>> +		convert_to_sparse(istate, 0);
>>>>  
>>>>  	return istate->cache_nr;
>>>
>>> We used to say "when we know we are running a command that is not
>>> sparse ready, expand it here" and nothing else.
>>>
>>> We now added a bit more condition for expansion, i.e. "when we are
>>> told that the repository does not want sparse index, or when the
>>> command is not sparse ready".
>>>
>>> But the patch goes one more step.  "If we didn't find a reason to
>>> expand to full, and if the in-core index we read is not sparse, then
>>> call convert_to_sparse() on it".  So if the repo settings tells us
>>> that the repository wants a sparse index, and the index we read was
>>> not sparse, what does convert_to_sparse() without MEM_ONLY flag bit
>>> do?  Nothing special?
>>
>> You are absolutely right. I've been talking about what I _thought_
>> the code does (and should do) but I missed this 'else if' which is
>> in fact doing what you have been claiming the entire time. I should
>> have done a better job double-checking the code before doubling
>> down on my claims.
>>
>> I think the 'else if' should be removed, which would match my
>> expectations.
>>
> 
> By leaving that part out, wouldn't you only solve half of the "mismatch"
> between in-core index and repo setting? Earlier, you described your
> expectation as:
> 
>> * If index.sparse=false, then a sparse index will be converted to
>>   full upon read.
>>
>> * If index.sparse=true, then a full index will be converted to sparse
>>   on write.
> 
> Why should the direction of change to the setting value (false->true vs
> true->false) cause the index to convert at different times? Consider the
> scenario:
> 
> # In a cone-mode, sparse index-enabled sparse checkout repo
> $ git -c index.sparse=false status    # 1
> $ git status                          # 2
> $ git status                          # 3
> 
> Before this patch, the index has the following states per command:
> 
> (1) the index is sparse in-core, writes full on-disk
> (2) the index is full in-core, writes sparse on-disk
> (3) the index is sparse in-core, writes sparse on-disk
> 
> Here, I see two mismatches in my expectations: (1) operates on an in-core
> sparse index, despite `index.sparse=false`, and (2) operates on an in-core
> full index, despite `index.sparse=true`. 
> 
> What you're suggesting solves only the first mismatch. However, the second
> mismatch incurs the performance hit of a supposedly-sparse command actually
> operating on an in-core full index. It also creates an inconsistency between
> (2) and (3) in their use of the sparse index. What I'd like this patch to do
> is:
> 
> (1) the index is full in-core, writes full on-disk
> (2) the index is sparse in-core, writes sparse on-disk
> (3) the index is sparse in-core, writes sparse on-disk
> 
> Here, there are no more mismatches between in-core index usage and what is
> written to disk, and (2) and (3) use the same index sparsity.

I suppose that my perspective is that we always need to handle a full
index in-core, because it is a subset of the capabilities of a sparse
index. There is not much value in requiring the in-core index be sparse.

But also, I'm now less concerned about commands that convert from
full-to-sparse on read and expand back to full because of
command_requires_full_index. This should be a short-lived issue because
the index.sparse config is unlikely to be changing frequently, so once
the index is converted to sparse on write, we no longer need to do any
work to convert the in-core index to sparse on read.

The thing to keep in mind is that not every command that reads the index
also writes it. For example, the two 'git status' commands you list might
not write the index if there is no new information in the filesystem that
would trigger an index write.

In short: I've shifted my view and no longer oppose this conversion
immediately upon reading.

>>> I see many unconditional calls to convert_to_sparse(istate, 0) on
>>> the write code paths, where I instead would expect "if the repo
>>> wants sparse, call convert_to_sparse(), and if the repo does not
>>> want sparse, call ensure_full_index()", before actualy writing the
>>> entries out.  These also are setting traps to confuse readers.
>>>
>>> Perhaps we want a helper function "ensure_right_sparsity(istate)"
>>> that would be called where we have
>>>
>>> 	if (condition)
>>> 		convert_to_sparse();
>>> 	else
>>> 		ensure_full_index();
>>>
>>> or an unconditonal call to convert_to_sparse() to unconfuse readers?
>>
>> convert_to_sparse() has several checks that prevent the conversion
>> from happening, such as having a split index. In particular, it will
>> skip the conversion if the_repository->settings.sparse_index is false.
>> Thus, calling it unconditionally in the write code is correct.
>>
> 
> I may have introduced some confusion by redundantly checking
> `!istate->sparse_index` before converting to sparse (my goal was readability
> - showing the index does not need to be converted to sparse if it is already
> sparse - but I think it ended up doing the opposite). The condition could be
> removed entirely, thanks to an equivalent check inside `convert_to_sparse`:
> 
> -       if (istate->repo->settings.command_requires_full_index)
> +       if (!istate->repo->settings.sparse_index ||
> +           istate->repo->settings.command_requires_full_index)
>                 ensure_full_index(istate);
> +       else
> +               convert_to_sparse(istate, 0);

This is simpler.

Thanks,
-Stolee