Re: [PATCH 2/2] sparse-index: update index read to consider index.sparse config

Derrick Stolee <stolee@xxxxxxxxx> · Sat, 16 Oct 2021 21:21:48 -0400

On 10/15/2021 5:53 PM, Junio C Hamano wrote:
> "Victoria Dye via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
> 
>> From: Victoria Dye <vdye@xxxxxxxxxx>
>>
>> Use the index.sparse config setting to expand or collapse the index when
>> read. Previously, index.sparse would determine how the index would be
>> written to disk, but would not enforce whether the index is read into memory
>> as full or sparse. Now, the index is expanded when a sparse index is read
>> with `index.sparse=false` and is collapsed to sparse when a full index is
>> read with `index.sparse=true` (and the command does not require a full
>> index).
> 
> Instead of calling both in-core index and on-disk index, perhaps use
> "the in-core index" as appropriately in the above description and
> the result would become much less ambigous.
> 
> My knee-jerk reaction was that it is of dubious value to spend
> cycles to make the in-core index sparse after reading a non-sparse
> index from the disk to give to the caller, but this hurts only the
> commands that are not yet sparse-aware and call ensure_full_index()
> as the first thing they do.  To them, we are wasting cycles to
> shrink and expand for no good reason, and after they are done, the
> final writeout would create a sparse on-disk index.

I think you are slightly mistaken here: If index.sparse=true, then a
full index will be converted to one on write, but not immediately upon
read. This means that subsequent commands will read a sparse index, and
they will either benefit from that or not depending on whether they are
integrated with the sparse index or not.

The new behavior here is that if index.sparse=false, then we convert
a sparse index to a full one upon read, avoiding any chance that a
Git command is operating on a sparse index in-memory.

The simplest summary I can say is here:

* If index.sparse=false, then a sparse index will be converted to
  full upon read.

* If index.sparse=true, then a full index will be converted to sparse
  on write.

> Besides, the on-disk index is expected to be sparse most of the time
> when index.sparse is true, so it is hopefully a one-time waste that
> corrects by itself.
> 
> For all commands that are sparse-aware, especially when asked to
> perform their operation on the paths that are not hidden by a
> tree-like index entry, it may or may not be a win, but the downside
> would be much smaller.  The cost to shrink a full in-core index
> before writing out as a sparse one should be comparable to the cost
> to shrink a full in-core index just read from the disk before the
> sparse-index-aware caller works on it and leaves a still mostly
> sparse in-core index to be written out without much extra work to
> re-shrink it to the disk.
> 
>> This makes the behavior of `index.sparse` more intuitive, as it now clearly
>> enables/disables usage of a sparse index.
> 
> It is a minor thing, so I am willing to let it pass, but I am not
> sure about this claim.  The write-out codepath ensures, independent
> of this change, that a full on-disk index is corrected to become
> sparse when the configuration is set to true, and a sparse on-disk
> index is corrected to become full when the configuration is set to
> false, no?

The previous behavior required an index write for the sparse-to-full
transition. After this change, an index write is still required for the
full-to-sparse transition.

> So the only "intuitive"-ness we may be gaining is that the codepaths
> that are sparse-aware would work in their "sparse" (non-"sparse")
> mode when index.sparse is set to true (false), respectively, no
> matter how sparse (or not sparse) the on-disk index they work on is
> initially.  That might help debuggability (assuming that converting
> between the full and sparse forms are working correctly), but I am
> not sure if that is something end users would even care about.

I think you have a case for concern with the word "intuitive". Even
though I agree that the new setup is the best possible interpretation
of the setting, its inherit asymmetry can be confusing.

Thanks,
-Stolee