On 2015-08-05 23.19, Jeff King wrote: > On Wed, Aug 05, 2015 at 10:34:34AM -0700, Junio C Hamano wrote: > >>> As you can see, there is a lot of complexity in there and I'm not >>> convinced this is better than just exposing >>> 'parse_connect_url()', which already handles everything for us. I try expose and use parse_connect_url(): It handles the scp-like syntax "host:/path, literall IPV6 addresses, port numbers, ':' without a port number and all other Git specific parsing, which is inside and outside the RFC 3986. (I should know, because I managed to break the parser twice, and fix it) I added a diagnostics to connect.c, and if you run the a simply test, we can see that the colon slash logic is often unsufficient: tb@mypc:~/projects/git/tb.150731_connect> ./git fetch-pack --diag-url ssh://host/ Diag: url=ssh://host/ Diag: protocol=ssh Diag: userandhost=host Diag: port=NONE Diag: path=/ Diag: guesseddir=host/ tb@macce:~/projects/git/tb.150731_connect> ./git fetch-pack --diag-url ssh://host:/ Diag: url=ssh://host:/ Diag: protocol=ssh Diag: userandhost=host Diag: port=NONE Diag: path=/ Diag: guesseddir=/ On top of that, you can easily write test cases in t5601, as many as you want. The (minor) drawback is that it doesn't handle http:// or https://, but that is easy to add in the parser, and doesn't break existing code. The major which remains is to search for '@' in userandhost, and strip that off. (Or when there is a '@', search for a ':' before the '@', and strip that off) After that, all non-printable characters should be %-escaped. If we replace ':' as non-printable as well, we can make Windows users 1% more happy. >> >> If the function "handles everything for us", that's fine, but the >> primary reason I am hesitant is because parse_connect_url() was >> designed specifically not to have to worry about some protocols >> (e.g. I think feeding it a "http://" would fail, and more >> importantly, its current callers want such a call to fail). Also it >> is meant to handle some non-protocols (e.g. scp style host:path that >> does not follow <scheme>://...). > > True, but the transport code _is_ handling that at some point. It makes > me wonder if it would be possible to push the call to transport_get > further up inside cmd_clone(), and then provide some way to query the > remote path and hostname from the transport code. Then guess_dir_name > could just go away entirely, in favor of something like: > > dir_name = transport_get_path(transport); > if (!*dir_name) > dir_name = transport_get_host(transport); > > That may be overly simplistic or unworkable, though. I haven't dug into > the code. > >> Also does it handle the "2222" case above? I do not think >> parse_connect_url() even calls get_host_and_port() to be able to >> tell what "2222" means in these examples. > > Speaking of which, has anyone tested whether the old or new code handles > external remote helpers? Certainly: > > foo::https://host/repo.git > > should still use repo.git. But technically the string handed to > git-remote-foo does not have to look anything like a URL. In those cases > neither guess_dir_name nor the transport code have any idea what anything > to the right of the "::" means; we probably have to resort to blind > guessing based on characters like colon and slash. > It is easy to strip the foo:: part of the url, assume that the remote helper uses a RFC 3986 similar url syntax, so that we can feed the reminding https://host/repo.git into the parser (see above). If the remote helper doesn't do this, we can't guess anything, can we ? So error out and tell the user seems the right thing to do. In the hope that this is useful, pushed my prototype branch to https://github.com/tboegi/git/tree/150731_connect_diag_guess_name -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html