Re: [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 03 May 2017 19:48:22 -0700

Duy Nguyen <pclouds@xxxxxxxxx> writes:

>>> In the long run we may want to drop the macros guarded by
>>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.
>>
>> Why?
>
> Well.. why did you add NO_THE_INDEX_COMP... in the first place? ;-) j/k

That was to mark the code that implements the underlying machinery
that the code in a particular file has already been vetted and
converted to be capable of working with an arbitrary index_state
instance, not just the_index instance.

Your mentioning "at least outside builtin/" shows that you have a
good understanding of the issue; they are infrastructure code and
many of them may become more useful if they are enhanced to take an
index_state instance instead of always working with the_index.  With
an infrastructure code that insists on working on the_index, an
application that has something valuable in the_index already needs
to somehow save it elsewhere before calling the function and use the
result from the_index before restoring its version.

I think unpack-trees.c used to assume that it is OK to use the_index,
but after it started taking src/dst_index, it gained NO_THE_INDEX_COMP
thing.  That's the kind of change in this area we would want to see.

> I attempted the same thing once or twice in the past, had the same
> impression and dropped it. But I think it's good to get rid of cache_*
> macros, at least outside builtin/ directory.

One thing that needs to be kept in mind is that a mechanical
replacement of active_cache[] to the_index.cache[] in the
infrastructure code does not get us closer to such an improvement.
In fact, such a patch only has negative value (keep reading until
the end of the paragraph that begins with "One thing is certain" to
understand why).

The entry point to a codechain for an important infrastructure code
that can currently only work with the_index needs to be identified.
There may be many.  They need to be made to take an index_state
instance and to pass it through the callchain.  That kind of change
would be an improvement, and will get us closer to mark the
resulting code with NO_THE_INDEX_COMP thing.

I think somebody else said this, but I agree that in the longer
term, we would want to have a "repository" abstraction for in-core
data structure.

Imagine that a program wants to read into an index_state the tip
commit of a branch whose name is the same as the current branch in
one submodule and do something with the objects, while keeping
the_index for the top-level superproject.  Thanks to Michael's and
your recent work, we are almost there to have "ref-store" that
represents the set of refs the submodule in question has that a
"repository" object that represents the submodule can point at.  We
can learn the object name at the tip of a specific ref using that
infrastructure.  Thanks to an ancient change under discussion, we
can already read the contents of the index file into an instance of
an index_state that is different from the_index.  But after doing
these operations, we'd need to actually get to the contents of the
objects the index_state holds.  How should that work in the
"repository object represents an in-core repository" world?

What is missing is a way to represent the object store, so that the
"repository" object can point at an instance of it (which in turn
may point at a set of its alternates) [*1*].  With that, perhaps the
most general kind of the infrastructure API that works with an
index_state instance may additionally take a repository object, or
perhaps an index_state may have a pointer to the repository where
the objects listed in it must be taken from (in which case the
function may take only an index_state).  In the end the enhanced
function has to allow us to read from the object store of the
submodule, not from the superproject's.

Depending on how the design of the repository object goes, the
interface to these functions may have to change.

But one thing is certain. Many users of the API, just like a lot of
builtin/ code works only with the_index today, will not have to work
with a non-default repository or a non-default index_state.  Only
the more advanced and complex "multi-repo" Porcelains would have to
care.  Keeping active_cache[], active_nr, etc. abstraction would
isolate the simpler callers from the inevitable code churn we would
need while getting the "repository" abstraction right.  This is why
I see that you understood the issues well when you said "builtiln/".

If we want to have the "repository" abstraction, the "object store"
is the next thing we should be working on, and such a work can take
inspiration from how the old <active_cache[], active_nr,
active_alloc, active_cache_changed and *active_cache_tree> tuple was
turned into an "index" object while allowing the majority of code
that want to deal with the singleton instance keep using the old
names via CPP macros.

Also, dropping compatibility macros at the end of the series is
unfriendly to fellow developers with WIPs that depend on the
traditional way of doing things.  It is doubly insulting to them to
send such a series during the feature freeze period, when it is more
likely that they are holding off their WIP in an attempt to allow us
to concentrate more on what we should be, i.e. finding and fixing
possible regressions at the tip of 'master' to polish the upcoming
release and help the end users.

[Footnote]

*1* Instead we have an ugly hack that lets the singleton object
store add objects from the submodule object store, and it does not
allow us to back out once we did so (unlike the "ref-store" thing
that we could).  When we need to work with a project with many
submodules, we should be able to say "OK, we are done with this
thing for now, call repository_clear() to reclaim the resources we
used to deal with it".