Re: [PATCH v2] builtin/push: call set_refspecs after validating remote

Karthik Nayak <karthik.188@xxxxxxxxx> · Wed, 10 Jul 2024 06:12:21 -0700

Junio C Hamano <gitster@xxxxxxxxx> writes:

> Karthik Nayak <karthik.188@xxxxxxxxx> writes:
>
>> - Added more details to the commit message to clarify the issue at hand.
>
>> Since 9badf97c42 (remote: allow resetting url list, 2024-06-14), we
>> reset the remote URL if the provided URL is empty. When a user of
>> 'remotes_remote_get' tries to fetch a remote with an empty repo name,
>> the function initializes the remote via 'make_remote'. But the remote is
>> still not a valid remote, since the URL is empty, so it tries to add the
>> URL alias using 'add_url_alias'. This in-turn will call 'add_url', but
>> since the URL is empty we call 'strvec_clear' on the `remote->url`. Back
>> in 'remotes_remote_get', we again check if the remote is valid, which
>> fails, so we return 'NULL' for the 'struct remote *' value
>
> That's better, but it still talks only about the implementation and
> control flow description without mentioning what end-user action it
> breaks.  We see in the new test this:
>
>     $ git push "" main
>
> but is that something we even want to support?  Before 9badf97c
> (remote: allow resetting url list, 2024-06-14), the command would
> have failed with different way [*1*].
>
> Is it merely "this is a nonsense request and must fail, but we do
> not want to hit BUG in general"?  I think it is the latter, but
> leaving it unsaid is confusing.  How about starting it more like...
>
>     When an end-user runs "git push" with an empty string for the
>     remote repository name, e.g.
>
>         $ git push '' main
>
>     "git push" fails with a BUG().  This is because ...
>
> or something.
>

Thanks, I was focused on the technical aspect of it that I didn't really
spill out the user interaction, you're right, it should start this way.
I will modify accordingly.

> cmd_push() calls set_refspecs() on a repo (whose name is ""), which
> then calls remote.c:remote_get() on it, which calls
> remote.c:make_remote() to obtain a remote structure.
>
> But during this calling sequence especially down from remote_get(),
> there are inconsistent decisions made for an empty string as a name.
> It is not a NULL pointer, so it does not benefit from the default
> refspecs by calling get_default=remotes_remote_for_branch,
> valid_remote_nick() considers it is not a valid nick so we do not
> read remotes or branches configuration file, but we still think a
> name was given (even though it is an empty string) and try to do
> something useful by calling add_url_alias().
>
> It is a mess and whoever designed this should be shot [*2*] ;-).
>
> In any case, an obvious additional fix on top of your change might
> be to do something like this:
>
>         diff --git i/remote.c w/remote.c
>         index 5fa046c8f8..d7f9ba3571 100644
>         --- i/remote.c
>         +++ w/remote.c
>         @@ -682,7 +682,7 @@ remotes_remote_get_1(
>                 struct remote *ret;
>                 int name_given = 0;
>
>         -	if (name)
>         +	if (name && *name)
>                         name_given = 1;
>                 else
>                         name = get_default(remote_state, remote_state->current_branch,
>
> which would give us the default remote name, and we would not call
> add_url_alias() with a bogus empty string to nuke the list.
>

I'm a bit skeptical of making this change. Mostly from the user's
perspective.

With my patch currently:

    $ git push "" refs/heads/master
    fatal: bad repository ''

But with this added, we'd be doing

    $ git push "" refs/heads/master
    Everything up-to-date

This is because we actually obtained the default remote here. Isn't this
confusing from a user's perspective? I mean I agree that an empty repo
name is something we should support, but it also shouldn't be something
we simply ignore?

>> - Cleaned up 'set_refspecs' by now accepting a remote instead of trying
>>   to obtain one.
>
> The last one, which does make sense, is not mentioned in the
> proposed log message.  Lazy remote creation does not help the
> updated caller because it already has one.
>
>> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>> +
>>  TEST_PASSES_SANITIZE_LEAK=true
>>  . ./test-lib.sh
>>
>> @@ -38,6 +41,13 @@ test_expect_success 'detect missing sha1 expressions early' '
>>  	test_cmp expect rp-ran
>>  '
>>
>> +# We need to use an existing local_ref so that the remote is mapped to
>> +# it in 'builtin/push.c:set_refspecs()'.
>
> Hmph, it is OK to force 'main' like the above as a workaround, but
> wouldn't it be sufficient to do
>
>     $ git push "" HEAD:refs/heads/main
>
> or something to avoid having to know what our current branch is?
>
> Thanks.
>

So if we're simply testing my patch, this is definitely enough. But I
wanted to also keep in mind the state before this patch. Wherein the
only way the flow would be triggered is if we provide a local_ref,
providing ':' follows a different path in 'set_refspecs'.

>
>
> [Footnote }
>
>  *1* For example, before Peff's series, here is what I got:
>
>          $ git push "" HEAD:master
>          fatal: no path specified; see 'git help pull' for valid url syntax
>
>      It comes from transport_push() -> get_refs_via_connect() ->
>      parse_connect_url() code path.
>
>  *2* It probably is me writing in the original in shell script, so I
>      can safely say this without offending anybody.
Attachment:
signature.asc

Description: PGP signature