> So your load is probably really spiky, as you get thundering herds of > fetchers after every push (the spikes may have a long flatline at the > top, as it takes time to process the whole herd). It is quite spiky, yes. At the moment, however, the replication fleet is relatively small (at the moment it's just 4 machines). We had 6 machines earlier this month and we had hoped that terminating two of them would lead to a material drop in CPU usage, but we didn't see a really significant reduction. > Yes, though I'd be surprised if this negotiation is that expensive in > practice. In my experience it's not generally, and even if we ended up > traversing every commit in the repository, that's on the order of a few > seconds even for large, active repositories. > > In my experience, the problem in a mass-fetch like this ends up being > pack-objects preparing the packfile. It has to do a similar traversal, > but _also_ look at all of the trees and blobs reachable from there, and > then search for likely delta-compression candidates. > > Do you know which processes are generating the load? git-upload-pack > does the negotiation, and then pack-objects does the actual packing. When I look at expensive operations (ones that I can see consuming 90%+ of a CPU for more than a second), there are often pack-objects processes running that will consume an entire core for multiple seconds (I also saw one pack-object counting process run for several minutes while using up a full core). rev-list shows up as a pretty active CPU consumer, as do prune and blame-tree. I'd say overall that in terms of high-CPU consumption activities, `prune` and `rev-list` show up the most frequently. On the subject of prune - I forgot to mention that the `git fetch` calls from the subscribers are running `git fetch --prune`. I'm not sure if that changes the projected load profile. > Maybe. If pack-objects is where your load is coming from, then > counter-intuitively things sometimes get _worse_ as you fetch less. The > problem is that git will generally re-use deltas it has on disk when > sending to the clients. But if the clients are missing some of the > objects (because they don't fetch all of the branches), then we cannot > use those deltas and may need to recompute new ones. So you might see > some parts of the fetch get cheaper (negotiation, pack-object's > "Counting objects" phase), but "Compressing objects" gets more > expensive. I might be misunderstanding this, but if the subscriber is already "up to date" modulo a single updated ref tip, then this problem shouldn't occur, right? Concretely: if ref A is built off of ref B, and the subscriber already has B when it requests A, that shouldn't cause this behavior, but it would cause this behavior if the subscriber didn't have B when it requested A. > This is particularly noticeable with shallow fetches, which in my > experience are much more expensive to serve. I don't think we're doing shallow fetches anywhere in this system. > Jakub mentioned bitmaps, and if you are using GitHub Enterprise, they > are enabled. But they won't really help here. They are essentially > cached information that git generates at repack time. But if we _just_ > got a push, then the new objects to fetch won't be part of the cache, > and we'll fall back to traversing them as normal. On the other hand, > this should be a relatively small bit of history to traverse, so I'd > doubt that "Counting objects" is that expensive in your case (but you > should be able to get a rough sense by watching the progress meter > during a fetch). See comment above about a long-running counting objects process. I couldn't tell which of our repositories it was counting, but it went for about 3 minutes with full core utilization. I didn't see us counting pack-objects frequently but it's an expensive operation. > I'd suspect more that delta compression is expensive (we know we just > got some new objects, but we don't know if we can make good deltas > against the objects the client already has). That's a gut feeling, > though. > > If the fetch is small, that _also_ shouldn't be too expensive. But > things add up when you have a large number of machines all making the > same request at once. So it's entirely possible that the machine just > gets hit with a lot of 5s CPU tasks all at once. If you only have a > couple cores, that takes many multiples of 5s to clear out. I think this would show up if I was sitting around running `top` on the machine, but that doesn't end up being what I see. That might just be a function of there being a relatively small number of replication machines, I'm not sure. But I'm not noticing 4 of the same tasks get spawned simultaneously, which says to me that we're either utilizing a cache or there's some locking behavior involved. > There's nothing in upstream git to help smooth these loads, but since > you mentioned GitHub Enterprise, I happen to know that it does have a > system for coalescing multiple fetches into a single pack-objects. I > _think_ it's in GHE 2.5, so you might check which version you're > running (and possibly also talk to GitHub Support, who might have more > advice; there are also tools for finding out which git processes are > generating the most load, etc). We're on 2.6.4 at the moment. > I suspect there's room for improvement and tuning of the primary. But > barring that, one option would be to have a hierarchy of replicas. Have > "k" first-tier replicas fetch from the primary, then "k" second-tier > replicas fetch from them, and so on. Trade propagation delay for > distributing the load. :) Yep, this was the first thought we had. I started fleshing out the architecture for it but wasn't sure if the majority of the CPU load was being triggered by the first request to make it in, in which case moving to a multi-tiered architecture wouldn't help us. When we didn't notice a material reduction in CPU load after reducing the replication fleet by 1/3, I started to worry about this and stopped moving forward on the multi-tiered architecture until I understood the behavior better. - V