On 05/03, Junio C Hamano wrote: > Duy Nguyen <pclouds@xxxxxxxxx> writes: > > >>> In the long run we may want to drop the macros guarded by > >>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them. > >> > >> Why? > > > > Well.. why did you add NO_THE_INDEX_COMP... in the first place? ;-) j/k > > That was to mark the code that implements the underlying machinery > that the code in a particular file has already been vetted and > converted to be capable of working with an arbitrary index_state > instance, not just the_index instance. > > Your mentioning "at least outside builtin/" shows that you have a > good understanding of the issue; they are infrastructure code and > many of them may become more useful if they are enhanced to take an > index_state instance instead of always working with the_index. With > an infrastructure code that insists on working on the_index, an > application that has something valuable in the_index already needs > to somehow save it elsewhere before calling the function and use the > result from the_index before restoring its version. > > I think unpack-trees.c used to assume that it is OK to use the_index, > but after it started taking src/dst_index, it gained NO_THE_INDEX_COMP > thing. That's the kind of change in this area we would want to see. > > > I attempted the same thing once or twice in the past, had the same > > impression and dropped it. But I think it's good to get rid of cache_* > > macros, at least outside builtin/ directory. > > One thing that needs to be kept in mind is that a mechanical > replacement of active_cache[] to the_index.cache[] in the > infrastructure code does not get us closer to such an improvement. > In fact, such a patch only has negative value (keep reading until > the end of the paragraph that begins with "One thing is certain" to > understand why). > > The entry point to a codechain for an important infrastructure code > that can currently only work with the_index needs to be identified. > There may be many. They need to be made to take an index_state > instance and to pass it through the callchain. That kind of change > would be an improvement, and will get us closer to mark the > resulting code with NO_THE_INDEX_COMP thing. > > I think somebody else said this, but I agree that in the longer > term, we would want to have a "repository" abstraction for in-core > data structure. > > Imagine that a program wants to read into an index_state the tip > commit of a branch whose name is the same as the current branch in > one submodule and do something with the objects, while keeping > the_index for the top-level superproject. Thanks to Michael's and > your recent work, we are almost there to have "ref-store" that > represents the set of refs the submodule in question has that a > "repository" object that represents the submodule can point at. We > can learn the object name at the tip of a specific ref using that > infrastructure. Thanks to an ancient change under discussion, we > can already read the contents of the index file into an instance of > an index_state that is different from the_index. But after doing > these operations, we'd need to actually get to the contents of the > objects the index_state holds. How should that work in the > "repository object represents an in-core repository" world? > > What is missing is a way to represent the object store, so that the > "repository" object can point at an instance of it (which in turn > may point at a set of its alternates) [*1*]. With that, perhaps the > most general kind of the infrastructure API that works with an > index_state instance may additionally take a repository object, or > perhaps an index_state may have a pointer to the repository where > the objects listed in it must be taken from (in which case the > function may take only an index_state). In the end the enhanced > function has to allow us to read from the object store of the > submodule, not from the superproject's. > > Depending on how the design of the repository object goes, the > interface to these functions may have to change. > > But one thing is certain. Many users of the API, just like a lot of > builtin/ code works only with the_index today, will not have to work > with a non-default repository or a non-default index_state. Only > the more advanced and complex "multi-repo" Porcelains would have to > care. Keeping active_cache[], active_nr, etc. abstraction would > isolate the simpler callers from the inevitable code churn we would > need while getting the "repository" abstraction right. This is why > I see that you understood the issues well when you said "builtiln/". > > If we want to have the "repository" abstraction, the "object store" > is the next thing we should be working on, and such a work can take > inspiration from how the old <active_cache[], active_nr, > active_alloc, active_cache_changed and *active_cache_tree> tuple was > turned into an "index" object while allowing the majority of code > that want to deal with the singleton instance keep using the old > names via CPP macros. > > Also, dropping compatibility macros at the end of the series is > unfriendly to fellow developers with WIPs that depend on the > traditional way of doing things. It is doubly insulting to them to > send such a series during the feature freeze period, when it is more > likely that they are holding off their WIP in an attempt to allow us > to concentrate more on what we should be, i.e. finding and fixing > possible regressions at the tip of 'master' to polish the upcoming > release and help the end users. > > > [Footnote] > > *1* Instead we have an ugly hack that lets the singleton object > store add objects from the submodule object store, and it does not > allow us to back out once we did so (unlike the "ref-store" thing > that we could). When we need to work with a project with many > submodules, we should be able to say "OK, we are done with this > thing for now, call repository_clear() to reclaim the resources we > used to deal with it". > Thanks for the detailed explanation on what a possible transition to having a repository object would look like. While digging through the code it does seem like the object store is one of the main things holding the code base back from having a repository object. Thank being said, I spent today hacking away at 'ls-files' in order to see if it would be feasible to convert it to using a repository object for the recursive submodule case instead of using run-command. I picked 'ls-files' mainly because the majority of the code paths don't touch the object store and mainly deal with manipulating the index. Despite the hack-y, cobbled together state that it is in currently it seems very feasible goal. -- Brandon Williams