[PATCH 0/9] following http redirects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Git's http handling takes care of redirects by setting
CURLOPT_FOLLOWLOCATION, and just assuming curl will handle the details.
For the most part, this works, but there are a few downsides:

  1. We do extra round-trips for subsequent requests, as we
     always feed curl the original URL, and it then has to go hit the
     redirect.

  2. Some redirects are to upgrade http->https. In this case, we end up
     sending the credential over unencrypted http. We do advertise the
     proper unencrypted URL to the credential helpers (and to the user
     when we prompt them), so to some degree this is the user's fault.
     Still, there's no reason not to be friendly and avoid exposing the
     password in plaintext if we know that the server is just going to
     redirect us to an encrypted connection anyway.

  3. Cross-server redirects do not work at all with authentication. If
     we tell curl to go to host X, and it redirects to host Y, it will
     drop the auth credentials to avoid revealing them to Y. This is a
     good thing, because we prompted the user for host X, and curl is
     being safe. However, it means that we have no way to feed curl the
     right password (and the user does not even get a useful error
     message; simply a 401 from Y, since we didn't give it any
     credential).

None of these is fixable at the curl layer; they depend on information
we have outside of curl (e.g., the notion that if info/refs is
redirected, so should the rest of the requests be).

This series gets the redirect information from curl (after the fact),
and uses it update the base URL for the repository for the remainder of
the program run.

Kyle, I didn't call into the urlmatch code at all; we might want another
patch on top to re-check the http.$url.* config after following such a
redirect.

The patches are:

  [1/9]: http_get_file: style fixes
  [2/9]: http_request: factor out curlinfo_strbuf
  [3/9]: http: refactor options to http_get_*
  [4/9]: http: hoist credential request out of handle_curl_result
  [5/9]: http: provide effective url to callers
  [6/9]: http: update base URLs when we see redirects
  [7/9]: remote-curl: make refs_url a strbuf
  [8/9]: remote-curl: store url as a strbuf
  [9/9]: remote-curl: rewrite base url from info/refs redirects

Here is sample output from running a clone against a private repo on
github, which redirects from http to https. Generated with:

  GIT_CURL_VERBOSE=1 git clone http://github.com/github/github 2>&1 |
  egrep '^(< HTTP|< Location|> GET|> POST|^Authorization|\* Connected to)' |
  sed -e 's/\(Authorization\): .*/\1: **redacted**/' 

Here is the output with stock git (I've added numbering for each
connection):

1 * Connected to github.com (192.30.252.129) port 80 (#0)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    < HTTP/1.1 301 Moved Permanently
    < Location: https://github.com/github/github/info/refs?service=git-upload-pack

2 * Connected to github.com (192.30.252.129) port 443 (#1)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    < HTTP/1.1 401 Authorization Required

3 * Connected to github.com (192.30.252.129) port 80 (#2)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    < HTTP/1.1 301 Moved Permanently
    < Location: https://github.com/github/github/info/refs?service=git-upload-pack

4 * Connected to github.com (192.30.252.129) port 443 (#1)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    < HTTP/1.1 401 Authorization Required

5 * Connected to github.com (192.30.252.129) port 443 (#1)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    Authorization: **redacted**
    < HTTP/1.1 200 OK

6 * Connected to github.com (192.30.252.129) port 80 (#3)
    > POST /github/github/git-upload-pack HTTP/1.1
    Authorization: **redacted**
    < HTTP/1.1 301 Moved Permanently
    < Location: https://github.com/github/github/git-upload-pack

7 * Connected to github.com (192.30.252.129) port 443 (#4)
    [... POST actually happens, and I hit ^C before it finishes ...]

You can see that connections 3 and 6 are extraneous round-trips, and
that we reveal the credential over cleartext in step 6. Curiously, even
though we have fed the credential to curl after step 2, curl does not
share it until the server sends another 401 in step 4. I'm not sure why
that is the case.

Here's the output with these patches applied:

1 * Connected to github.com (192.30.252.129) port 80 (#0)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    < HTTP/1.1 301 Moved Permanently
    < Location: https://github.com/github/github/info/refs?service=git-upload-pack

2 * Connected to github.com (192.30.252.129) port 443 (#1)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    < HTTP/1.1 401 Authorization Required

3 * Connected to github.com (192.30.252.129) port 443 (#1)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    < HTTP/1.1 401 Authorization Required

4 * Connected to github.com (192.30.252.129) port 443 (#1)
    > GET /github/github/info/refs?service=git-upload-pack HTTP/1.1
    Authorization: **redacted**
    < HTTP/1.1 200 OK

5 * Connected to github.com (192.30.252.129) port 443 (#2)
    [... POST actually happens ...]

We do 2 fewer round-trips. That isn't all that exciting, but keep in
mind that it saves one per request after the initial contact. So if we
had more ref negotiation, we'd see more savings. Similarly, dumb http
would see many more savings (since it makes individual requests for each
loose object and pack).

We also go straight to https in steps 3 and 4, meaning the credential
never gets passed in plaintext to the old URL.

I also tested cross-server redirects using a hacky "nc -l" setup as the
server. I confirmed that it does not work at all with stock git (you are
prompted for the password for "localhost", which curl then drops after
following the redirect), and that it does work after the patches.

The one downside I can think of is that we are making the assumption
that if info/refs redirects, the rest of the repo will, too. So you
could not have "info/refs" redirect to another server to get the ref
advertisement, but then expect to get the actual pack from the original
location. I'm OK with making that assumption; I can't imagine the
insanity of the setup that would be required to break it.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]