On Mon, Apr 29, 2024 at 07:04:40PM -0300, Matheus Afonso Martins Moreira wrote: > Thank you for your feedback. > > > are there any plans to integrate the parser into connect.c and fetch ? > > Yes. > > That was my intention but I was not confident enough to touch connect.c > before getting feedback from the community, since it's critical code > and it is my first contribution. Welcome to the Git community. I wasn't aware of t0110 as a test case... > > I do want to merge all URL parsing in git into this one function though, > thereby creating a "single point of truth". This is so that if the algorithm > is modified the changes are visible to the URL parser builtin as well. > That is a good thing to do. Be prepared for a longer journey, since we have this legacy stuff to deal with. But I am happy to help with reviews, even if that may take some days, [] > When adding test cases, I looked at the possibilities enumerated in urls.txt > and generated test cases based on those. I also looked at the urlmatch.h > test cases. However... > > > Some work can be seen in t5601-clone.sh > > ... I did not think to check those. > > > Especially, when dealing with literal IPv6 addresses, > > the ones with [] and the simplified ssh syntax 'myhost:src' > > are interesting to test. > > You're right about that. I shall prepare an updated v2 patchset > with more test cases, and also any other changes/improvements > requested by maintainers. > > > And some features using the [] syntax to embedd a port number > > inside the simplified ssh syntax had not been documented, > > but used in practise, and are now part of the test suite. > > See "[myhost:123]:src" in t5601 > > Indeed, I did not read anything of the sort when I checked it. > Would you like me to commit a note to this effect to urls.txt ? On short: please not. This kind of syntax was never ment to be used. The official "ssh://myhost:123/src" is recommended. When IPv6 parsing was added, people discovered that it could be used to "protect" the ':' from being a seperator between the hostname and the path, and can be used to seperate the hostname from the port. Once that was used in real live, it was too late to change it. If we now get a better debug tool, it could mention that this is a legacy feature, and recommend the longer "ssh://" syntax. > > > Or is this new tool just a helper, to verify "good" URL's, > > and not accepting our legacy parser quirks ? > > It is my intention that this builtin be able to accept, parse > and decompose all types of URLs that git itself can accept. > > > Then we still should see some IPv6 tests ? > > I will add them! > > > Or may be not, as we prefer hostnames these days ? > > I would have to defer that choice to someone more experienced > with the codebase. Please advise on how to proceed. Re-reading this email conversation, I think that we should support (in the future), what we support today. Having a new parser tool means, that there is a chance to reject those URLs with the note/hint, that they are depracted, and should be replaced by a proper one. >From my point of view this means that all existing test case should pass even with the new parser, as a general approach. Deprecating things is hard, may take years, and may be done in a seperate task/patch series. Or may be part of this one, in seperate commits. > > > The RFC 1738 uses the term "scheme" here, and using the very generic > > term "protocol" may lead to name clashes later. > > Would something like "git_scheme" or so be better ? > > Scheme does seem like a better word if it's the terminology used by RFCs. > I can change that in a new version if necessary. > That code is based on the existing connect.c parsing code though. > > > I think that the "///" version is superflous, it should already > > be covered by the "//" version > > I thought it was a good idea because of existing precedent: > my first approach to creating the test cases was to copy the > ones from t0110-urlmatch-normalization.sh which did have many > cases such as those. Then as I developed the code I came to > believe that it was not necessary: I call url_normalize > in the url_parse function and url_normalize is already being > tested. I think I just forgot to delete those lines. > > Reading that file over once again, it does have IPv6 address > test cases. So I should probably go over it again. > > Thanks again for the feedback, > > Matheus >