On Fri, Aug 06 2021, Jonathan Nieder wrote: > Ævar Arnfjörð Bjarmason wrote: > >> Or perhaps not, but they're my currently my best effort to explain the >> differences between the two and how they interact. So I think it's best >> to point to those instead of coming up with something in this reply, >> which'll inevitably be an incomplete rewrite of much of that. >> >> In short, there are use-cases that packfile-uri is inherently unsuitable >> for, or rather changing the packfile-uri feature to support them would >> pretty much make it indistinguishable from this bundle-uri mechanism, >> which I think would just add more confusion to the protocol. > > Hm. I was hoping you might say more about those use cases --- e.g. is > there a concrete installation that wants to take advantage of this? > By focusing on the real-world example, we'd get a better shared > understanding of the underlying constraints. I hacked this up for potential use on GitLab's infrastructure, mainly as a mechanism to relieve CPU pressure on CI thundering herds. Often you need full clones, and you sometimes need to do those from scratch. When you've just had a push come in it's handy to convert those to incremental fetches on top of a bundle you made recently. It's not deployed on anything currently, it's just something I've been hacking up. I'll be on vacation much of the rest of this month, the plan is to start stressing it on real-world use-cases after that. I thought I'd send this RFC first. > After all, both are ways to reduce the bandwidth of a clone or other > large fetch operation by offloading the bulk of content to static > serving. The support we have for packfile-uri in git.git now as far as the server side goes, I think it's fair to say, fairly immature, I gather that JGit's version is more advanced, and that's what's serving up things at Google at e.g. https://chromium.googlesource.com. I.e. right now for git-upload-pack you need to exhaustively enumerate all the objects to exclude, although there's some on-list patches recently for being able to supply tips. More importantly your CDN reliability MUST match that of your git server, otherwise your clone fails (as the server has already sent the "other side" of the expected CDN pack). Furthermore you MUST as the server be able to tell the client what pack checksum on the CDN they should expect, which requires a very tight coupling between git server and CDN. You can't e.g. say "bootstrap with this bundle/pack" and point to something like Debian's async-updated FTP network as a source. The bootstrap data may or may not be there, and it may or may not be as up-to-date as you'd like. I think any proposed integration into git.git should mainly consider the bulk of users, the big hosting providers can always run with their own patches. I think this approach opens up the potential for easier and therefore wider CDN integration for git servers for providers that aren't one of the big fish. E.g. your CDN generation can be daily cronjob, and the server can point to it blindly and hope for the best. The client will optimistically use the CDN data, and recover if not. I think one thing I oversold is the "path to resumable clones", i.e. that's all true, but I don't think that's really any harder go do with packfile-uri in theory (in practice just serving up a sensible pack with it is pretty tedious with git-upload-pack as it stands). The negotiation aspect of it is also interesting and something I've been experimenting with. I.e. the bundles are what the client sends as its HAVE tips. This allows a server to anticipate what dialog newly cloning clients are likely to have, and even pre-populate a cache of that locally (it could even serve that diff up as a packfile-uri :). Right now it retrieves each bundle in full before adding the tips to negotiate to a subsequent dialog, but I've successfully experimental locally with negotiating on the basis of objects we don't even have yet. I.e. download the bundle(s), and as soon as we have the header fire off the dialog with the server to get its PACK on the basis of those promised tips. It makes the recovery a bit more involved in case the bundles don't have what we're after, but this allows us to disconnect really fast from the server and twiddle our own thumbs while we finish downloading the bundles to get the full clone. We already disconnect early in cases where the bundle(s) already have what we're after. This E-Mail got rather long, but hey, I did point you parts of this series that cover some/most of this, and since it wasn't clear what if anything of that you'd read ... :) Hopefully this summary is useful.