Derrick Stolee wrote: > On 10/17/2021 9:15 PM, Junio C Hamano wrote: >> Derrick Stolee <stolee@xxxxxxxxx> writes: >> >>>> * In addition, with these patches, if index.sparse=false, a >>>> sparse index will be expaned to full upon reading, and if it >>>> is true, a non-sparse index will be shrunk to sparse upon >>>> reading >>> >>> This is only half true. If "index.sparse=false", then a sparse >>> index will be expanded to full upon reading. If "index.sparse=true" >>> then nothing special will happen to an index on reading. >> >> OK. I somehow got the impression that we convert in both ways from >> the patch text under discussion, specifically this part in >> do_read_index(): >> >>> - if (istate->repo->settings.command_requires_full_index) >>> + if (!istate->repo->settings.sparse_index || >>> + istate->repo->settings.command_requires_full_index) >>> ensure_full_index(istate); >>> + else if (!istate->sparse_index) >>> + convert_to_sparse(istate, 0); >>> >>> return istate->cache_nr; >> >> We used to say "when we know we are running a command that is not >> sparse ready, expand it here" and nothing else. >> >> We now added a bit more condition for expansion, i.e. "when we are >> told that the repository does not want sparse index, or when the >> command is not sparse ready". >> >> But the patch goes one more step. "If we didn't find a reason to >> expand to full, and if the in-core index we read is not sparse, then >> call convert_to_sparse() on it". So if the repo settings tells us >> that the repository wants a sparse index, and the index we read was >> not sparse, what does convert_to_sparse() without MEM_ONLY flag bit >> do? Nothing special? > > You are absolutely right. I've been talking about what I _thought_ > the code does (and should do) but I missed this 'else if' which is > in fact doing what you have been claiming the entire time. I should > have done a better job double-checking the code before doubling > down on my claims. > > I think the 'else if' should be removed, which would match my > expectations. > By leaving that part out, wouldn't you only solve half of the "mismatch" between in-core index and repo setting? Earlier, you described your expectation as: > * If index.sparse=false, then a sparse index will be converted to > full upon read. > > * If index.sparse=true, then a full index will be converted to sparse > on write. Why should the direction of change to the setting value (false->true vs true->false) cause the index to convert at different times? Consider the scenario: # In a cone-mode, sparse index-enabled sparse checkout repo $ git -c index.sparse=false status # 1 $ git status # 2 $ git status # 3 Before this patch, the index has the following states per command: (1) the index is sparse in-core, writes full on-disk (2) the index is full in-core, writes sparse on-disk (3) the index is sparse in-core, writes sparse on-disk Here, I see two mismatches in my expectations: (1) operates on an in-core sparse index, despite `index.sparse=false`, and (2) operates on an in-core full index, despite `index.sparse=true`. What you're suggesting solves only the first mismatch. However, the second mismatch incurs the performance hit of a supposedly-sparse command actually operating on an in-core full index. It also creates an inconsistency between (2) and (3) in their use of the sparse index. What I'd like this patch to do is: (1) the index is full in-core, writes full on-disk (2) the index is sparse in-core, writes sparse on-disk (3) the index is sparse in-core, writes sparse on-disk Here, there are no more mismatches between in-core index usage and what is written to disk, and (2) and (3) use the same index sparsity. >> I see many unconditional calls to convert_to_sparse(istate, 0) on >> the write code paths, where I instead would expect "if the repo >> wants sparse, call convert_to_sparse(), and if the repo does not >> want sparse, call ensure_full_index()", before actualy writing the >> entries out. These also are setting traps to confuse readers. >> >> Perhaps we want a helper function "ensure_right_sparsity(istate)" >> that would be called where we have >> >> if (condition) >> convert_to_sparse(); >> else >> ensure_full_index(); >> >> or an unconditonal call to convert_to_sparse() to unconfuse readers? > > convert_to_sparse() has several checks that prevent the conversion > from happening, such as having a split index. In particular, it will > skip the conversion if the_repository->settings.sparse_index is false. > Thus, calling it unconditionally in the write code is correct. > I may have introduced some confusion by redundantly checking `!istate->sparse_index` before converting to sparse (my goal was readability - showing the index does not need to be converted to sparse if it is already sparse - but I think it ended up doing the opposite). The condition could be removed entirely, thanks to an equivalent check inside `convert_to_sparse`: - if (istate->repo->settings.command_requires_full_index) + if (!istate->repo->settings.sparse_index || + istate->repo->settings.command_requires_full_index) ensure_full_index(istate); + else + convert_to_sparse(istate, 0);