Re: [PATCH 00/13] builtin: implement, document and test url-parse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 28, 2024 at 10:30:48PM +0000, Matheus Moreira via GitGitGadget wrote:
> Git commands accept a wide variety of URLs syntaxes, not just standard URLs.
> This can make parsing git URLs difficult since standard URL parsers cannot
> be used. Even if an external parser were implemented, it would have to track
> git's development closely in case support for any new URL schemes are added.
>
> These patches introduce a new url-parse builtin command that exposes git's
> native URL parsing algorithms as a plumbing command, allowing other programs
> to then call upon git itself to parse the git URLs and their components.
>
> This should be quite useful for scripts. For example, a script might want to
> add remotes to repositories, naming them according to the domain name where
> the repository is hosted. This new builtin allows it to parse the git URL
> and extract its host name which can then be used as input for other
> operations. This would be difficult to implement otherwise due to git's
> support for scp style URLs.
>

All in all, having a URL parser as such is a good thing, thanks for working
on that.

There are, however, some notes and questions, up for discussion:

- are there any plans to integrate the parser into connect.c and fetch ?
  Speaking as a person, who manage to break the parsing of URLs once,
  with the good intention to improve things, I need to learn that
  test cases are important.
  Some work can be seen in t5601-clone.sh
  Especially, when dealing with literal IPv6 addresses, the ones with []
  and the simplified ssh syntax 'myhost:src' are interesting to test.
  Git itself strives to be RFC compliant when parsing URLs, but
  we do not fully guarantee to be "fully certified".
  And some features using the [] syntax to embedd a port number
  inside the simplified ssh syntax had not been documented,
  but used in practise, and are now part of the test suite.
  See "[myhost:123]:src" in t5601

- Or is this new tool just a helper, to verify "good" URL's,
  and not accepting our legacy parser quirks ?
  Then we still should see some IPv6 tests ?
  Or may be not, as we prefer hostnames these days ?

- One minor comment:
  in 02/13 we read:
        +enum protocol {
        +       PROTO_UNKNOWN = 0,
        +       PROTO_LOCAL,
        +       PROTO_FILE,
        +       PROTO_SSH,
        +       PROTO_GIT,
  The RFC 1738 uses the term "scheme" here, and using the very generic
  term "protocol" may lead to name clashes later.
  Would something like "git_scheme" or so be better ?

- One minor comment:
   In 13/13 we read:
        +       git url-parse "file:///" &&
        +       git url-parse "file://"

  I think that the "///" version is superflous, it should already
  be covered by the "//" version






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux