On Thu, Mar 15, 2018 at 8:21 PM, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > On Thu, Mar 15 2018, Duy Nguyen jotted: > >> On Mon, Mar 12, 2018 at 8:30 PM, Ævar Arnfjörð Bjarmason >> <avarab@xxxxxxxxx> wrote: >>> We already have pack.packSizeLimit, perhaps we could call this >>> e.g. gc.keepPacksSize=2GB? >> >> I'm OK either way. The "base pack" concept comes from the >> "--keep-base-pack" option where we do keep _one_ base pack. But gc >> config var has a slightly different semantics when it can keep >> multiple packs. > > I see, yeah it would be great to generalize it to N packs. > >>> Finally I wonder if there should be something equivalent to >>> gc.autoPackLimit for this. I.e. with my proposed semantics above it's >>> possible that we end up growing forever, i.e. I could have 1000 2GB >>> packs and then 50 very small packs per gc.autoPackLimit. >>> >>> Maybe we need a gc.keepPackLimit=100 to deal with that, then e.g. if >>> gc.keepPacksSize=2GB is set and we have 101 >= 2GB packs, we'd pick the >>> two smallest one and not issue a --keep-pack for those, although then >>> maybe our memory use would spike past the limit. >>> >>> I don't know, maybe we can leave that for later, but I'm quite keen to >>> turn the top-level config variable into something that just considers >>> size instead of "base" if possible, and it seems we're >95% of the way >>> to that already with this patch. >> >> At least I will try to ignore gc.keepPacksSize if all packs are kept >> because of it. That repack run will hurt. But then we're back to one >> giant pack and plenty of small packs that will take some time to grow >> up to 2GB again. > > I think that semantic really should have its own option. The usefulness > of this is significantly diminished if it's not a guarantee on the > resource use of git-gc. > > Consider a very large repo where we clone and get a 4GB pack. Then as > time goes on we end up with lots of loose objects and small packs from > pulling, and eventually end up with say 4GB + 2x 500MB packs (if our > limit is 500MB). > > If I understand what you're saying correctly if we ever match the gc > --auto requirements because we have *just* the big packs and then a > bunch of loose objects (say we rebased a lot) then we'll try to create a > giant 5GB pack (+ loose objects). Yes. There isn't a simple and easy solution here and I consider packing (even if it's expensive) to regain performance is better than not packing at all. I could tweak that a bit by keeping the largest pack out (so we have to packs in the end). After a long long long time when your second pack gets to 5 GB, then we hit the most expensive repack. But that should be ok for now, I guess. I think this repack strategy was discussed here at some point in the past by Gerrit guys. Their goal was to reduce I/O, I believe. A perfect solution probably could be found, but I don't want to hold this series back until it's found and I don't want to introduce a zillion config knobs that become useless later on when the perfect solution is found. >>> Actually maybe that should be a "if we're that low on memory, forget >>> about GC for now" config, but urgh, there's a lot of potential >>> complexity to be handled here... >> >> Yeah I think what you want is a hook. You can make custom rules then. >> We already have pre-auto-gc hook and could pretty much do what you >> want without pack-objects memory estimation. But if you want it, maybe >> we can export the info to the hook somehow. > > I can do away with that particular thing, but I'd really like to do > without the hook. I can automate it on some machines, but then we also > have un-managed laptops run by users who clone big repos. It's much > easier to tell them to set a few git config variables than have them > install & keep some hook up-to-date. That sounds like we need a mechanism to push hooks (and config stuff) automatically from clone source. I think this topic was touched in the summit? I don't object adding new config but we need to figure out what we need, and from this thread I think there are too many "I don't know" to settle on a solution. -- Duy