On Mon, 22 Oct 2018 at 11:51:35 +0200, Lionel Elie Mamane wrote: > On Wed, Oct 17, 2018 at 09:03:45PM +0200, Guilhem Moulin wrote: >> On Wed, 17 Oct 2018 at 14:05:27 +0200, Eike Rathke wrote: >>> On Wednesday, 2018-10-17 04:27:54 +0200, Guilhem Moulin wrote: >>>> Lastly, it's now possible to clone and fetch git repositories over >>>> https:// . While git:// URLs will remain supported for the foreseeable >>>> future, they're intentionally no longer advertised in gerrit, and we >>>> encourage you to upgrade the scheme of your ‘remote.<name>.url’ to >>>> secure transports (SSH for authenticated access, or HTTPS for anonymous >>>> access). We'll update ‘lode’ and chase remaining git:// URLs shortly. > >>> Why is git:// deprecated? From what I know it's more efficient when >>> fetching/pulling than https:// (or ssh://?) > >> Since v1.6.6 it's no longer true [0], cf. git-http-backend(1) and >> https://git-scm.com/book/en/v2/Git-Basics-Working-with-Remotes > > That webpage doesn't seem to contain a discussion of the efficiency of > the various protocols. My bad, I probably copy the URL from a wrong tab. This is what I intended to share: https://git-scm.com/book/en/v2/Git-Internals-Transfer-Protocols . As you can see the protocols are essentially equivalent. For a high-level overview and pros and cons of each protocol, there is also https://git-scm.com/book/en/v2/Git-on-the-Server-The-Protocols , which reads “There is very little advantage that other protocols have over Smart HTTP for serving Git content.” :-) To be fair, it also says that “The Git protocol is often the fastest network transfer protocol available”, but that's just because no encryption is always faster than the fastest encryption. In practice however, this argument is moot on modern CPUs. FWIW, GitHub doesn't mentioned git:// URLs either (even though they're still supported): https://help.github.com/articles/which-remote-url-should-i-use/ . >> SSH is only used for transport, a git processed is exec()'ed on the >> remote just like for git-daemon(1), so the only overhead is >> crypto-related. The handshake is a one-off thing, thus negligible >> when you're transferring a large amount of data at once; (...) As >> for symmetric crypto overhead, (...) the overhead should be >> negligible. > > All I know is that about 1/2/3 years ago ('t was I think in some > coworking space in Brussels, probably a hackfest) I showed Michael > Meeks how to have a separate "push" url (with ssh: protocol) and > "pull" url (with git: protocol) and he was very happy at the > speed-up. Might be orthogonal to the git:// vs. https:// vs. ssh:// discussion. Gerrit uses JGit as Git implementation, while git-daemon(1) spawns “normal” (C-based) git-upload-pack(1) processes. I recall Norbert and I sat down during FOSDEM 2017 to solve perf issues with our JGit deployment. Perhaps you configured your ‘remote.<name>.pushurl’ at the same time :-) Anyway, it's easy enough to benchmark no-op `git fetch` on core. master is currently at c99732d59bc6, and I'm fetching from the same datacenter to avoid metrics being polluted with network hiccups. $ git config remote.origin.url git://git.libreoffice.org/core && time git fetch 0:01.62 (0.42 user, 0.64 sys) 142108k maxres ## Network usage: up 252kiB (4312 packets), down 10168kiB (7197 packets) $ git config remote.origin.url https://git.libreoffice.org/core && time git fetch 0:01.63 (0.81 user, 0.29 sys) 141688k maxres ## Network usage: up 56kiB (924 packets), down 4194kiB (1241 packets) $ git config remote.origin.url "ssh://$USER@xxxxxxxxxxxxxxxxxxxxxx:29418/core" && time git fetch 0:01.55 (0.62 user, 0.46 sys) 141588k maxres ## Network usage: up 67kiB (993 packets), down 9859kiB (1305 packets) Pretty much equivalent, aren't they? :-) (Network usage for https:// is smaller because the TLS termination proxy is also compressing responses from the git backend. For git:// I guess the system time is higher than the user time because it uses use sendfile(2) and friends since there are no user-space encryption.) As one might notice, network usage (~10MiB down, and growing) is really high for a no-op `git fetch`. That's caused by the >140k refs/changes/… in the initial git-upload-pack advertisement(1): $ git ls-remote https://git.libreoffice.org/core | awk ' $1 ~ /^[0-9a-f]{40}$/ { refs++; if ($2 ~ /^refs\/changes\//) changes++; } END { printf "refs=%d, changes=%d (%.1f%%)\n", refs, changes, 100*changes/refs; } ' refs=144709, changes=142676 (98.6%) All remote types are affected. Since the number of changesets seems to grow linearly [0], we should try to find a solution if we want the repository to keep scaling. I had an attempt at setting ‘uploadpack.hideRefs’ (and ‘uploadpack.allowTipSHA1InWant’) last Friday, to exclude refs/changes/… from the initial advertisement, but that broke CI hence needs more work. There is no urgency anyway (it's not a regression) and although it's getting worse over time, by the time it's unbearable the Git protocol v2 [1] might save us :-) -- Guilhem. [0] https://dashboard.documentfoundation.org/app/kibana#/dashboard/Gerrit [1] https://opensource.googleblog.com/2018/05/introducing-git-protocol-version-2.html
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ LibreOffice mailing list LibreOffice@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/libreoffice