Jeff King <peff@xxxxxxxx> wrote: > On Wed, Apr 27, 2016 at 09:53:24PM +0000, Eric Wong wrote: > > > It can be tempting for a server admin to want a stable set of > > long-lived packs for dumb clients; but also want to enable > > bitmaps to serve smart clients more quickly. > But I did want to mention one thing, which is that long-lived split > packs are a tradeoff, even for dumb clients. The pack format cannot do > deltas between packs, so the sum of your split packs is larger than a > single pack would be. That's a good thing for somebody who cloned > earlier, and wants to only a few small packs on top. But it's much worse > for somebody who wants to do a fresh clone, and has to grab all of the > packs either way. Definitely a trade off, but a fresh clone with packs might only be (at worst) doubling or tripling bandwidth use on both sides? However, the CPU/memory cost of packing is at least an order of magnitude (more likely several orders of magnitude) more expensive on the server. The client most likely won't care about CPU/memory usage, though. > > Fwiw, I'm hoping to publish an ~800MB git-clone-able repo of > > our ML archives, soonish. I can serve terabytes of dumb HTTP > > traffic all day long without breaking a sweat; but smart > > packing of big repos worries me; especially when feeding > > slow clients and having to leave processes running > > (or buffering pack output to disk). So perhaps I'll teach > > my HTTP server play dumb whenever CPU/memory usage is high. > > Yeah, CPU and memory load for serving large clones is a problem. Memory > especially scales with number of objects (because we keep the whole > packing list in memory for the entirety of the write). At GitHub, we > have some changes to try to serve things verbatim from the on-disk pack > without even creating an in-memory list of objects (it's just a bitmap > of which objects in the packfile to send), and that reduces CPU and > memory load quite a bit. Cleaning up and submitting those patches has > been on my todo list for a while, but I just haven't gotten to it. I'm > of course happy to share the messy state if you want to pick through it > yourself. Sure thing! I can't promise I'll have time, either, but being able to serve packs verbatim would be great; especially if you could multiplex it with epoll/kqueue for folks on slow pipes (and maybe use sendfile, but perhaps that's not worth the effort with TLS everywhere nowadays). I was also wondering if fresh clones could be memoized entirely. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html