On Sun, Apr 28, 2024 at 10:30:48PM +0000, Matheus Moreira via GitGitGadget wrote: > Git commands accept a wide variety of URLs syntaxes, not just standard URLs. > This can make parsing git URLs difficult since standard URL parsers cannot > be used. Even if an external parser were implemented, it would have to track > git's development closely in case support for any new URL schemes are added. > > These patches introduce a new url-parse builtin command that exposes git's > native URL parsing algorithms as a plumbing command, allowing other programs > to then call upon git itself to parse the git URLs and their components. > > This should be quite useful for scripts. For example, a script might want to > add remotes to repositories, naming them according to the domain name where > the repository is hosted. This new builtin allows it to parse the git URL > and extract its host name which can then be used as input for other > operations. This would be difficult to implement otherwise due to git's > support for scp style URLs. > All in all, having a URL parser as such is a good thing, thanks for working on that. There are, however, some notes and questions, up for discussion: - are there any plans to integrate the parser into connect.c and fetch ? Speaking as a person, who manage to break the parsing of URLs once, with the good intention to improve things, I need to learn that test cases are important. Some work can be seen in t5601-clone.sh Especially, when dealing with literal IPv6 addresses, the ones with [] and the simplified ssh syntax 'myhost:src' are interesting to test. Git itself strives to be RFC compliant when parsing URLs, but we do not fully guarantee to be "fully certified". And some features using the [] syntax to embedd a port number inside the simplified ssh syntax had not been documented, but used in practise, and are now part of the test suite. See "[myhost:123]:src" in t5601 - Or is this new tool just a helper, to verify "good" URL's, and not accepting our legacy parser quirks ? Then we still should see some IPv6 tests ? Or may be not, as we prefer hostnames these days ? - One minor comment: in 02/13 we read: +enum protocol { + PROTO_UNKNOWN = 0, + PROTO_LOCAL, + PROTO_FILE, + PROTO_SSH, + PROTO_GIT, The RFC 1738 uses the term "scheme" here, and using the very generic term "protocol" may lead to name clashes later. Would something like "git_scheme" or so be better ? - One minor comment: In 13/13 we read: + git url-parse "file:///" && + git url-parse "file://" I think that the "///" version is superflous, it should already be covered by the "//" version