On Wed, May 3, 2017 at 4:31 AM, Samuel Lijin <sxlijin@xxxxxxxxx> wrote: > On Tue, May 2, 2017 at 9:05 AM, Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote: >> >> >> On 5/2/2017 12:17 AM, Stefan Beller wrote: >>> >>> On Mon, May 1, 2017 at 6:36 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>>> >>>> Stefan Beller <sbeller@xxxxxxxxxx> writes: >>>> >>>>> This applies to origin/master. >>>>> >>>>> For better readability and understandability for newcomers it is a good >>>>> idea >>>>> to not offer 2 APIs doing the same thing with on being the #define of >>>>> the other. >>>>> >>>>> In the long run we may want to drop the macros guarded by >>>>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them. >> >> >> Thank you for bringing this up and making this proposal. >> I started a similar effort internally last fall, but >> stopped because of the footprint size. >> >>>> >>>> Why? Why should we keep typing &the_index, when most of the time we >>>> are given _the_ index and working on it? >>> >>> >>> As someone knowledgeable with the code base you know that the cache_* >>> and index_* functions only differ by an index argument. A newcomer may not >>> know this, so they wonder why we have (A) so many functions [and which is >>> the >>> right function to use]; it is an issue of ease of use of the code base. >>> >>> Anything you do In submodule land today needs to spawn new processes in >>> the submodule. This is cumbersome and not performant. So in the far future >>> we may want to have an abstraction of a repo (B), i.e. all repository >>> state in >>> one struct/class. That way we can open a submodule in-process and perform >>> the required actions without spawning a process. >>> >>> The road to (B) is a long road, but we have to art somewhere. And this >>> seemed >>> like a good place by introducing a dedicated argument for the >>> repository. In a follow >>> up in the future we may want to replace &the_index by >>> "the_main_repo.its_index" >>> and then could also run the commands on other (submodule) indexes. But >>> more >>> importantly, all these commands would operate on a repository object. >>> >>> In such a far future we would have functions like the cmd_* functions >>> that would take a repository object instead of doing its setup discovery >>> on their own. >>> >>> Another reason may be its current velocity (or absence of it) w.r.t. to >>> these >>> functions, such that fewer merge conflicts may arise. >> >> >> In addition to (eventually) allowing multiple repos be open at >> the same time for submodules, it would also help with various >> multi-threading efforts. For example, we have loops that do a >> "for (k = 0, k < active_nr; k++) {...}" There is no visual clue >> in that code that it references "the_index" and therefore should >> be subject to the same locking. Granted, this is a trivial example, >> but goes to the argument that the code has lots of subtle global >> variables and macros that make it difficult to reason about the >> code. > > Just to throw out an example, I'm relatively new to the codebase (I've > been lurking on the mailing list for a few months now) and for a > recent project (I'm an undergrad wrapping up my senior year, and one > of my classes' final projects was to do something that involved > concurrency) I took a shot at parallelizing the estimate_similarity() > calls in diffcore_rename(). The only way I was able to get it to work > was by dropping global mutexes in one or two files (the code for those > mutexes still makes me cringe), because of concurrent writes to global > data structures. That sounds like a challenge. As we have many globals, we need to be very careful about threading. Also an interesting discussion about threading: https://public-inbox.org/git/9e4733910708111412t48c1beaahfbaa2c68a02f64f1@xxxxxxxxxxxxxx/ Are the patches available for discussion? Thanks, Stefan