Re: [PATCH] pack-objects: warn on split packs disabling bitmaps

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff@xxxxxxxx> wrote:
> On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote:
> 
> > It can be tempting for a server admin to want a stable set of
> > long-lived packs for dumb clients; but also want to enable
> > bitmaps to serve smart clients more quickly.

> But I did want to mention one thing, which is that long-lived split
> packs are a tradeoff, even for dumb clients. The pack format cannot do
> deltas between packs, so the sum of your split packs is larger than a
> single pack would be. That's a good thing for somebody who cloned
> earlier, and wants to only a few small packs on top. But it's much worse
> for somebody who wants to do a fresh clone, and has to grab all of the
> packs either way.

Definitely a trade off, but a fresh clone with packs might only
be (at worst) doubling or tripling bandwidth use on both sides?

However, the CPU/memory cost of packing is at least an order of
magnitude (more likely several orders of magnitude) more
expensive on the server.  The client most likely won't care
about CPU/memory usage, though.

> >  Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of
> >  our ML archives, soonish.  I can serve terabytes of dumb HTTP
> >  traffic all day long without breaking a sweat; but smart
> >  packing of big repos worries me; especially when feeding
> >  slow clients and having to leave processes running
> >  (or buffering pack output to disk).  So perhaps I'll teach
> >  my HTTP server play dumb whenever CPU/memory usage is high.
> 
> Yeah, CPU and memory load for serving large clones is a problem. Memory
> especially scales with number of objects (because we keep the whole
> packing list in memory for the entirety of the write). At GitHub, we
> have some changes to try to serve things verbatim from the on-disk pack
> without even creating an in-memory list of objects (it's just a bitmap
> of which objects in the packfile to send), and that reduces CPU and
> memory load quite a bit. Cleaning up and submitting those patches has
> been on my todo list for a while, but I just haven't gotten to it. I'm
> of course happy to share the messy state if you want to pick through it
> yourself.

Sure thing!  I can't promise I'll have time, either, but being
able to serve packs verbatim would be great; especially if you
could multiplex it with epoll/kqueue for folks on slow pipes
(and maybe use sendfile, but perhaps that's not worth the effort
 with TLS everywhere nowadays).

I was also wondering if fresh clones could be memoized entirely.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]