Re: [PATCH] submodule: implement `module_name` as a builtin helper

Stefan Beller <sbeller@xxxxxxxxxx> · Fri, 7 Aug 2015 14:21:52 -0700

On Fri, Aug 7, 2015 at 2:14 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> On Fri, Aug 7, 2015 at 1:17 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>> Jens Lehmann <Jens.Lehmann@xxxxxx> writes:
>>>
>>> This change...
>>>
>>>>> @@ -723,10 +733,8 @@ int fetch_populated_submodules(const struct argv_array *options,
>>>>>              if (!S_ISGITLINK(ce->ce_mode))
>>>>>                      continue;
>>>>>
>>>>> -            name = ce->name;
>>>>> -            name_for_path = unsorted_string_list_lookup(&config_name_for_path, ce->name);
>>>>> -            if (name_for_path)
>>>>> -                    name = name_for_path->util;
>>>>> +            name_for_path = submodule_name_for_path(ce->name);
>>>>> +            name =  name_for_path ? name_for_path : ce->name;
>>>
>>> ... interacts with Heiko's cached submodule config work that seems
>>> to have stalled.
>>
>> We can drop that hunk as it only uses the new method
>> `submodule_name_for_path` but doesn't change functionality.
>> So if you want to keep Heikos work, I'll just resend the patch
>> without that hunk.
>
> Does such a result even make sense?  Note that I wasn't talking
> about textual conflict.
>
> If we followed what you just said, that patch will try to directly
> read the data in config_name_for_path string list, which is removed
> by Heiko's series, if I am reading it right.
>
> In the new world order with Heiko's series, the way you grab
> submodule configuration data is to call submodule_from_path() or
> submodule_from_name() and they allow you to read from .gitmodules
> either in a commit (for historical state), the index, or from the
> working tree.  Which should be much cleaner and goes in the right
> direction in the longer term.
>
> And the best part of the story is that your module_name would be
> just calling submodule_from_path() and peeking into a field.
>
>> 2) Come up with a good thread pool abstraction
>>    (Started as "[RFC/PATCH 0/4] parallel fetch for submodules" )
>>    This abstraction (if done right) will allow us to use it in different places
>>    easily. I started it as part of "git fetch --recurse-submodules" because
>>    it is submodule related and reasonably sized
>
> I personally think this gives the most bang-for-buck.  Write that
> and expose it as "git submodule for-each-parallel", which takes the
> shell scriptlet that currently is the loop body of "while read mode
> sha1 stage sm_path" in update and clone.  You will have immediate
> and large payback.

You said that before. I feel like this is a bit to narrow. A "git submodule
for-each-parallel" would be a very specific tool which we would use to
make the different submodule operations parallel with ease. But it would
be very submodule specifc I guess?

That's why I want to be a bit more generic and have this thread pool API
done in C, such that "any for loop" in git can be easily replaced by using
the thread pool. I think of "git fetch --all" specially.

>
> And you do not even need module_list and module_name written in C in
> order to do so.  Not that these two are not trivial to implement, but
> the payoff from them is not as large as from for-each-parallel ;-)
>

I think I can do this for-each-parallel once I have the more generic thread
pooling done. All that is left now is a good handling of stdout/stderr, which
I am not yet convinced how to do it right. Maybe each task accumulates
messages in two string buffers and then the thread pool will output the
string buffer one a task is done.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html