Re: [PATCH 0/5] Start of a journey: drop NO_THE_INDEX_COMPATIBILITY_MACROS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 2, 2017 at 9:05 AM, Jeff Hostetler <git@xxxxxxxxxxxxxxxxx> wrote:
>
>
> On 5/2/2017 12:17 AM, Stefan Beller wrote:
>>
>> On Mon, May 1, 2017 at 6:36 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>>
>>> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>>>
>>>> This applies to origin/master.
>>>>
>>>> For better readability and understandability for newcomers it is a good
>>>> idea
>>>> to not offer 2 APIs doing the same thing with on being the #define of
>>>> the other.
>>>>
>>>> In the long run we may want to drop the macros guarded by
>>>> NO_THE_INDEX_COMPATIBILITY_MACROS. This converts a couple of them.
>
>
> Thank you for bringing this up and making this proposal.
> I started a similar effort internally last fall, but
> stopped because of the footprint size.
>
>>>
>>> Why?  Why should we keep typing &the_index, when most of the time we
>>> are given _the_ index and working on it?
>>
>>
>> As someone knowledgeable with the code base you know that the cache_*
>> and index_* functions only differ by an index argument. A newcomer may not
>> know this, so they wonder why we have (A) so many functions [and which is
>> the
>> right function to use]; it is an issue of ease of use of the code base.
>>
>> Anything you do In submodule land today needs to spawn new processes in
>> the submodule. This is cumbersome and not performant. So in the far future
>> we may want to have an abstraction of a repo (B), i.e. all repository
>> state in
>> one struct/class. That way we can open a submodule in-process and perform
>> the required actions without spawning a process.
>>
>> The road to (B) is a long road, but we have to art somewhere. And this
>> seemed
>> like a good place by introducing a dedicated argument for the
>> repository. In a follow
>> up in the future we may want to replace &the_index by
>> "the_main_repo.its_index"
>> and then could also run the commands on other (submodule) indexes. But
>> more
>> importantly, all these commands would operate on a repository object.
>>
>> In such a far future we would have functions like the cmd_* functions
>> that would take a repository object instead of doing its setup discovery
>> on their own.
>>
>> Another reason may be its current velocity (or absence of it) w.r.t. to
>> these
>> functions, such that fewer merge conflicts may arise.
>
>
> In addition to (eventually) allowing multiple repos be open at
> the same time for submodules, it would also help with various
> multi-threading efforts.  For example, we have loops that do a
> "for (k = 0, k < active_nr; k++) {...}"  There is no visual clue
> in that code that it references "the_index" and therefore should
> be subject to the same locking.  Granted, this is a trivial example,
> but goes to the argument that the code has lots of subtle global
> variables and macros that make it difficult to reason about the
> code.

Just to throw out an example, I'm relatively new to the codebase (I've
been lurking on the mailing list for a few months now) and for a
recent project (I'm an undergrad wrapping up my senior year, and one
of my classes' final projects was to do something that involved
concurrency) I took a shot at parallelizing the estimate_similarity()
calls in diffcore_rename(). The only way I was able to get it to work
was by dropping global mutexes in one or two files (the code for those
mutexes still makes me cringe), because of concurrent writes to global
data structures.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]