Re: [PATCH] submodule: Fetch the direct sha1 first

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 19, 2016 at 1:13 PM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> Stefan Beller <sbeller@xxxxxxxxxx> writes:
>
>> When reviewing a change in Gerrit, which also updates a submodule,
>> a common review practice is to download and cherry-pick the patch locally
>> to test it. However when testing it locally, the 'git submodule update'
>> may fail fetching the correct submodule sha1 as the corresponding commit
>> in the submodule is not yet part of the project history, but also just a
>> proposed change.
>>
>> To ease this, try fetching by sha1 first and when that fails (in case of
>> servers which do not allow fetching by sha1), fall back to the default
>> behavior we already have.
>>
>> Signed-off-by: Stefan Beller <sbeller@xxxxxxxxxx>
>> ---
>>
>> I think it's best to apply this on origin/master, there is no collision
>> with sb/submodule-parallel-update.
>>
>> Also I do not see a good way to test this both in correctness as well
>> as performance degeneration. If the first git fetch fails, the second
>> fetch is executed, so it should behave as before this patch w.r.t. correctness.
>>
>> Regarding performance, the first fetch should fail quite fast iff the fetch
>> fails and then continue with the normal fetch. In case the first fetch works
>> fine getting the exact sha1, the fetch should be faster than a default fetch
>> as potentially less data needs to be fetched.
>
> "The fetch should be faster" may not be making a good trade-off
> overall--people may have depended on the branches configured to be
> fetched to be fetched after this codepath is exercised, but now if
> the commit bound to the superproject tree happens to be complete,
> even though it is not anchored by any remote tracking ref (hence the
> next GC may clobber it), the fetch of other branches will not
> happen.
>
> My knee-jerk reaction is that the order of fallback is probably the
> other way around.  That is, try "git fetch" as before, check again
> if the commit bound to the superproject tree is now complete, and
> fallback to fetch that commit with an extra "git fetch".

I thought about that and assumed we'd need to have an option for
fetch like "--try-to-get-sha1", which depending on the servers capabilities
would just add that sha1 to the "wants" during fetching negotiation if the
server supports it, otherwise just fetch normally.

Doing a 'git fetch' only and not the fetch for the specific sha1 would be
incorrect? ('git fetch' with no args finishes successfully, so no fallback is
triggered. But we are not sure if we obtained the sha1, so we need to
check if we have the sha1 by doing a local check and then try to get the sha1
again if we don't have it locally. So doing the reverse order would be
more code here for correctness.

>
> Jens, what do you think?
>
>>  git-submodule.sh | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/git-submodule.sh b/git-submodule.sh
>> index 9bc5c5f..ee0b985 100755
>> --- a/git-submodule.sh
>> +++ b/git-submodule.sh
>> @@ -746,8 +746,9 @@ Maybe you want to use 'update --init'?")"
>>                               # Run fetch only if $sha1 isn't present or it
>>                               # is not reachable from a ref.
>>                               (clear_local_git_env; cd "$sm_path" &&
>> +                                     remote_name=$(get_default_remote)
>>                                       ( (rev=$(git rev-list -n 1 $sha1 --not --all 2>/dev/null) &&
>> -                                      test -z "$rev") || git-fetch)) ||
>> +                                      test -z "$rev") || git-fetch $remote_name $rev
>
> Regardless of the "fallback order" issue, I do not think $rev is a
> correct thing to fetch here.  The superproject binds $sha1 to its
> tree, and you would be checking that out, so shouldn't you be
> fetching that commit?

Both $sha1 and $rev are in the submodule (because
'git submodule--helper list' puts out the sha1 as the
submodule sha1). $rev is either empty or equal to $sha1
in my understanding of "rev-list $sha1 --not --all". However for
readability maybe we want to write:

    (clear_local_git_env; cd "$sm_path" &&
        test -z $(git rev-list -n 1 $sha1 --not --all 2>/dev/null) ||
        git fetch $(get_default_remote) $sha1 ||
        git fetch ||
        die ...
    )

So in case you want to the other order, I'd propose

    (clear_local_git_env; cd "$sm_path" &&
        test -z $(git rev-list -n 1 $sha1 --not --all 2>/dev/null) ||
        git fetch ||
        (git cat-file -e $sha1 && git fetch $(get_default_remote) $sha1) ||
        die ...
    )

Oh! Looking at that I suspect the
"test -z $(git rev-list -n 1 $sha1 --not --all 2>/dev/null)"
and "git cat-file -e" are serving the same purpose here and should just
indicate if the given sha1 is present or not.

So we could reduce it further to

    (clear_local_git_env; cd "$sm_path" &&
        git cat-file -e $sha1 || git fetch ||
        (git cat-file -e $sha1 && git fetch $(get_default_remote) $sha1) ||
        die ...
    )

I may have messed up the logic operators along the way, maybe it is
even better if
we rewrite it with non shorted conditions.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]