Re: [PATCH] submodule: use cheaper check for submodule pushes

Junio C Hamano <gitster@xxxxxxxxx> · Thu, 13 Jul 2017 11:37:04 -0700

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> On Wed, Jul 12, 2017 at 5:53 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>> Jonathan Nieder <jrnieder@xxxxxxxxx> writes:
>>
>>>> In the function push_submodule[1] we use add_submodule_odb[2] to determine
>>>> if a submodule has been populated. However the function does not work with
>>>> the submodules objects that are added, instead a new child process is used
>>>> to perform the actual push in the submodule.
>>>>
>>>> Use is_submodule_populated[3] that is cheaper to guard from unpopulated
>>>> submodules.
>>>>
>>>> [1] 'push_submodule' was added in eb21c732d6 (push: teach
>>>>     --recurse-submodules the on-demand option, 2012-03-29)
>>>> [2] 'add_submodule_odb' was introduced in 752c0c2492 (Add the
>>>>     --submodule option to the diff option family, 2009-10-19)
>>>> [3] 'is_submodule_populated' was added in 5688c28d81 (submodules:
>>>>     add helper to determine if a submodule is populated, 2016-12-16)
>>>
>>> These footnotes don't answer the question that I really have: why did
>>> this use add_submodule_odb in the first place?
>>>
>>> E.g. did the ref iteration code require access to the object store
>>> previously and stop requiring it later?
>>
>> Yes, the most important question is if it is safe to lose the access
>> to the object store of the submodule.  It is an endgame we should
>> aim for to get rid of add_submodule_odb(), but does the rest of this
>> codepath not require objects in the submodule at all or do we still
>> need to change something to make it so?
>
> Yes, as the code in the current form as well as in its first occurrence
> used the result of add_submodule_odb to determine if to spawn a child process.

The original added so that the return value of the call can be used
for that, and the current code still uses the return value for that
purpose.

That much is already known.  

I think Jonathan's question (which I concurred) is if we also ended
up relying on the side effect of calling that function (i.e. being
able to now find objects that are not in our repository but in the
submodule's object store).  By looking at the eb21c732d6, we can
tell that the original didn't mean to and didn't add any code that
relies on the ability to be able to read from the submodule object
store.  I am not sure if that is still true after 5 years (i.e. is
there any new code added in the meantime that made us depend on the
ability to read from submodule object store?).

My hunch (and hope) is that we are probably safe, but that is a lot
weaker than "yes this is a good change we want to apply".