Re: [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS

Stefan Beller <sbeller@xxxxxxxxxx> · Wed, 3 May 2017 10:14:55 -0700

On Wed, May 3, 2017 at 4:31 AM, Samuel Lijin <sxlijin@xxxxxxxxx> wrote:
> On Tue, May 2, 2017 at 9:05 AM, Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:
>>
>>
>> On 5/2/2017 12:17 AM, Stefan Beller wrote:
>>>
>>> On Mon, May 1, 2017 at 6:36 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>>
>>>> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>>>>
>>>>> This applies to origin/master.
>>>>>
>>>>> For better readability and understandability for newcomers it is a good
>>>>> idea
>>>>> to not offer 2 APIs doing the same thing with on being the #define of
>>>>> the other.
>>>>>
>>>>> In the long run we may want to drop the macros guarded by
>>>>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.
>>
>>
>> Thank you for bringing this up and making this proposal.
>> I started a similar effort internally last fall, but
>> stopped because of the footprint size.
>>
>>>>
>>>> Why?  Why should we keep typing &the_index, when most of the time we
>>>> are given _the_ index and working on it?
>>>
>>>
>>> As someone knowledgeable with the code base you know that the cache_*
>>> and index_* functions only differ by an index argument. A newcomer may not
>>> know this, so they wonder why we have (A) so many functions [and which is
>>> the
>>> right function to use]; it is an issue of ease of use of the code base.
>>>
>>> Anything you do In submodule land today needs to spawn new processes in
>>> the submodule. This is cumbersome and not performant. So in the far future
>>> we may want to have an abstraction of a repo (B), i.e. all repository
>>> state in
>>> one struct/class. That way we can open a submodule in-process and perform
>>> the required actions without spawning a process.
>>>
>>> The road to (B) is a long road, but we have to art somewhere. And this
>>> seemed
>>> like a good place by introducing a dedicated argument for the
>>> repository. In a follow
>>> up in the future we may want to replace &the_index by
>>> "the_main_repo.its_index"
>>> and then could also run the commands on other (submodule) indexes. But
>>> more
>>> importantly, all these commands would operate on a repository object.
>>>
>>> In such a far future we would have functions like the cmd_* functions
>>> that would take a repository object instead of doing its setup discovery
>>> on their own.
>>>
>>> Another reason may be its current velocity (or absence of it) w.r.t. to
>>> these
>>> functions, such that fewer merge conflicts may arise.
>>
>>
>> In addition to (eventually) allowing multiple repos be open at
>> the same time for submodules, it would also help with various
>> multi-threading efforts.  For example, we have loops that do a
>> "for (k = 0, k < active_nr; k++) {...}"  There is no visual clue
>> in that code that it references "the_index" and therefore should
>> be subject to the same locking.  Granted, this is a trivial example,
>> but goes to the argument that the code has lots of subtle global
>> variables and macros that make it difficult to reason about the
>> code.
>
> Just to throw out an example, I'm relatively new to the codebase (I've
> been lurking on the mailing list for a few months now) and for a
> recent project (I'm an undergrad wrapping up my senior year, and one
> of my classes' final projects was to do something that involved
> concurrency) I took a shot at parallelizing the estimate_similarity()
> calls in diffcore_rename(). The only way I was able to get it to work
> was by dropping global mutexes in one or two files (the code for those
> mutexes still makes me cringe), because of concurrent writes to global
> data structures.

That sounds like a challenge. As we have many globals, we need to be
very careful about threading.

Also an interesting discussion about threading:
https://public-inbox.org/git/9e4733910708111412t48c1beaahfbaa2c68a02f64f1@xxxxxxxxxxxxxx/

Are the patches available for discussion?

Thanks,
Stefan