Re: [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS

Brandon Williams <bmwill@xxxxxxxxxx> · Wed, 3 May 2017 20:24:15 -0700

On 05/03, Junio C Hamano wrote:
> Duy Nguyen <pclouds@xxxxxxxxx> writes:
> 
> >>> In the long run we may want to drop the macros guarded by
> >>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.
> >>
> >> Why?
> >
> > Well.. why did you add NO_THE_INDEX_COMP... in the first place? ;-) j/k
> 
> That was to mark the code that implements the underlying machinery
> that the code in a particular file has already been vetted and
> converted to be capable of working with an arbitrary index_state
> instance, not just the_index instance.
> 
> Your mentioning "at least outside builtin/" shows that you have a
> good understanding of the issue; they are infrastructure code and
> many of them may become more useful if they are enhanced to take an
> index_state instance instead of always working with the_index.  With
> an infrastructure code that insists on working on the_index, an
> application that has something valuable in the_index already needs
> to somehow save it elsewhere before calling the function and use the
> result from the_index before restoring its version.
> 
> I think unpack-trees.c used to assume that it is OK to use the_index,
> but after it started taking src/dst_index, it gained NO_THE_INDEX_COMP
> thing.  That's the kind of change in this area we would want to see.
> 
> > I attempted the same thing once or twice in the past, had the same
> > impression and dropped it. But I think it's good to get rid of cache_*
> > macros, at least outside builtin/ directory.
> 
> One thing that needs to be kept in mind is that a mechanical
> replacement of active_cache[] to the_index.cache[] in the
> infrastructure code does not get us closer to such an improvement.
> In fact, such a patch only has negative value (keep reading until
> the end of the paragraph that begins with "One thing is certain" to
> understand why).
> 
> The entry point to a codechain for an important infrastructure code
> that can currently only work with the_index needs to be identified.
> There may be many.  They need to be made to take an index_state
> instance and to pass it through the callchain.  That kind of change
> would be an improvement, and will get us closer to mark the
> resulting code with NO_THE_INDEX_COMP thing.
> 
> I think somebody else said this, but I agree that in the longer
> term, we would want to have a "repository" abstraction for in-core
> data structure.
> 
> Imagine that a program wants to read into an index_state the tip
> commit of a branch whose name is the same as the current branch in
> one submodule and do something with the objects, while keeping
> the_index for the top-level superproject.  Thanks to Michael's and
> your recent work, we are almost there to have "ref-store" that
> represents the set of refs the submodule in question has that a
> "repository" object that represents the submodule can point at.  We
> can learn the object name at the tip of a specific ref using that
> infrastructure.  Thanks to an ancient change under discussion, we
> can already read the contents of the index file into an instance of
> an index_state that is different from the_index.  But after doing
> these operations, we'd need to actually get to the contents of the
> objects the index_state holds.  How should that work in the
> "repository object represents an in-core repository" world?
> 
> What is missing is a way to represent the object store, so that the
> "repository" object can point at an instance of it (which in turn
> may point at a set of its alternates) [*1*].  With that, perhaps the
> most general kind of the infrastructure API that works with an
> index_state instance may additionally take a repository object, or
> perhaps an index_state may have a pointer to the repository where
> the objects listed in it must be taken from (in which case the
> function may take only an index_state).  In the end the enhanced
> function has to allow us to read from the object store of the
> submodule, not from the superproject's.
> 
> Depending on how the design of the repository object goes, the
> interface to these functions may have to change.
> 
> But one thing is certain. Many users of the API, just like a lot of
> builtin/ code works only with the_index today, will not have to work
> with a non-default repository or a non-default index_state.  Only
> the more advanced and complex "multi-repo" Porcelains would have to
> care.  Keeping active_cache[], active_nr, etc. abstraction would
> isolate the simpler callers from the inevitable code churn we would
> need while getting the "repository" abstraction right.  This is why
> I see that you understood the issues well when you said "builtiln/".
> 
> If we want to have the "repository" abstraction, the "object store"
> is the next thing we should be working on, and such a work can take
> inspiration from how the old <active_cache[], active_nr,
> active_alloc, active_cache_changed and *active_cache_tree> tuple was
> turned into an "index" object while allowing the majority of code
> that want to deal with the singleton instance keep using the old
> names via CPP macros.
> 
> Also, dropping compatibility macros at the end of the series is
> unfriendly to fellow developers with WIPs that depend on the
> traditional way of doing things.  It is doubly insulting to them to
> send such a series during the feature freeze period, when it is more
> likely that they are holding off their WIP in an attempt to allow us
> to concentrate more on what we should be, i.e. finding and fixing
> possible regressions at the tip of 'master' to polish the upcoming
> release and help the end users.
> 
> 
> [Footnote]
> 
> *1* Instead we have an ugly hack that lets the singleton object
> store add objects from the submodule object store, and it does not
> allow us to back out once we did so (unlike the "ref-store" thing
> that we could).  When we need to work with a project with many
> submodules, we should be able to say "OK, we are done with this
> thing for now, call repository_clear() to reclaim the resources we
> used to deal with it".
> 

Thanks for the detailed explanation on what a possible transition to
having a repository object would look like.  While digging through the
code it does seem like the object store is one of the main things
holding the code base back from having a repository object.

Thank being said, I spent today hacking away at 'ls-files' in order to
see if it would be feasible to convert it to using a repository object
for the recursive submodule case instead of using run-command.  I picked
'ls-files' mainly because the majority of the code paths don't touch the
object store and mainly deal with manipulating the index.  Despite the
hack-y, cobbled together state that it is in currently it seems very
feasible goal.

-- 
Brandon Williams