Re: [PATCH 2/3] setup: have the_repository use the_index

Jonathan Nieder <jrnieder@xxxxxxxxx> · Wed, 12 Jul 2017 14:33:39 -0700

Hi,

Junio C Hamano wrote:
> Brandon Williams <bmwill@xxxxxxxxxx> writes:

>> Since it is a pointer then using a '#define' to replace 'the_index'
>> (which is not a pointer) would be a little more challenging.
>
> The above is merely realizing another downside that stems from the
> earlier design decision that the index field is not a real embedded
> structure, but is a pointer.  It does not explain why it is better
> to have a pointer to an allocated structure in the first place.
>
> I am not (yet) telling you to fix the design to have a pointer
> "index" by replacing it with an embedded structure.  I may actually
> do so later, but I am first trying to find out if it is a right
> design decision with some advantage.

Consider a command that doesn't need to access the index at all (e.g.,
"git grep --recurse-submodules -e foo HEAD").

In favor of using an embedding instead of a pointer, there is the
advantage that it makes initialization simpler.  (It also involves a
tiny speedup by avoiding a pointer indirection on access, but that's
more negligible.)  For that reason it was a good choice when there was
only one repository in memory: using such a small bounded portion of
.bss space in exchange for some convenience is a good trade.

When a process has multiple repositories in memory (for example one
per thread), the trade-off becomes different.  Instead of .bss, the
unused embedded index is on the stack or heap.  Using embedding would
mean that instead of an unused extra word in the per-repository
structure we get an unused ~24 words.

An argument could be made that we wouldn't want to waste either 1 word
or 24 words per in-memory repository object --- we'd want to waste 0
words and separately keep a map from repositories to index_state that
only gets populated when needed.  That complicates index access a bit
too much for my taste.  1 word instead of 0 or 24 seems like a
sensible compromise.

All that said, I don't have a strong opinion on this.  Both the 1-word
approach (a pointer) and 24-word approach (embedding) are tolerable
and there are reasons to prefer each.

Thanks,
Jonathan