Re: With big repos and slower connections, git clone can be hard to work with

Ivan Frade <ifrade@xxxxxxxxxx> · Tue, 11 Jun 2024 12:40:14 -0700

On Mon, Jun 10, 2024 at 11:27 PM Jeff King <peff@xxxxxxxx> wrote:
>
> On Mon, Jun 10, 2024 at 12:04:30PM -0700, Emily Shaffer wrote:
>
> > > One strategy people have worked on is for servers to point clients at
> > > static packfiles (which _do_ remain byte-for-byte identical, and can be
> > > resumed) to get some of the objects. But it requires some scheme on the
> > > server side to decide when and how to create those packfiles. So while
> > > there is support inside Git itself for this idea (both on the server and
> > > client side), I don't know of any servers where it is in active use.
> >
> > We use packfile offloading heavily at Google (any repositories hosted
> > at *.googlesource.com, as well as our internal-facing hosting). It
> > works quite well for us scaling large projects like Android and
> > Chrome; we've been using it for some time now and are happy with it.
>
> Cool! I'm glad to hear it is in use.
>
> It might be helpful for other potential users if you can share how you
> decide when to create the off-loaded packfiles, what goes in them, and
> so on. IIRC the server-side config is mostly geared at stuffing a few
> large blobs into a pack (since each blob must have an individual config
> key). Maybe JGit (which I'm assuming is what powers googlesource) has
> better options there.

IIRC the upstream conf was oriented to offload individual blobs. In
JGit/Google we do the offloading at pack level. We write to storage
and CDN when creating a pack and keep the offloaded location in the
pack metadata. We do this only in certain conditions (GC, above a
certain size,...).

At serving time, if we see that we need to send a pack "as-is" (all
objects inside are needed) and it has an offload, then we mark it to
send the URL instead of the contents. As the offload is just a copy of
the pack, we can use the pack bitmap to know what is there or not.

> > However, one thing that's missing is the resumable download Ellie is
> > describing.

Another thing missing in the offload story is supporting offloads in
non-http protocols. e.g. after cloning via my-protocol://, being able
to fetch my-protocol://blah/blah urls.

Ivan