On Thu, Mar 15 2018, Duy Nguyen jotted: > On Mon, Mar 12, 2018 at 8:30 PM, Ævar Arnfjörð Bjarmason > <avarab@xxxxxxxxx> wrote: >> We already have pack.packSizeLimit, perhaps we could call this >> e.g. gc.keepPacksSize=2GB? > > I'm OK either way. The "base pack" concept comes from the > "--keep-base-pack" option where we do keep _one_ base pack. But gc > config var has a slightly different semantics when it can keep > multiple packs. I see, yeah it would be great to generalize it to N packs. >> Finally I wonder if there should be something equivalent to >> gc.autoPackLimit for this. I.e. with my proposed semantics above it's >> possible that we end up growing forever, i.e. I could have 1000 2GB >> packs and then 50 very small packs per gc.autoPackLimit. >> >> Maybe we need a gc.keepPackLimit=100 to deal with that, then e.g. if >> gc.keepPacksSize=2GB is set and we have 101 >= 2GB packs, we'd pick the >> two smallest one and not issue a --keep-pack for those, although then >> maybe our memory use would spike past the limit. >> >> I don't know, maybe we can leave that for later, but I'm quite keen to >> turn the top-level config variable into something that just considers >> size instead of "base" if possible, and it seems we're >95% of the way >> to that already with this patch. > > At least I will try to ignore gc.keepPacksSize if all packs are kept > because of it. That repack run will hurt. But then we're back to one > giant pack and plenty of small packs that will take some time to grow > up to 2GB again. I think that semantic really should have its own option. The usefulness of this is significantly diminished if it's not a guarantee on the resource use of git-gc. Consider a very large repo where we clone and get a 4GB pack. Then as time goes on we end up with lots of loose objects and small packs from pulling, and eventually end up with say 4GB + 2x 500MB packs (if our limit is 500MB). If I understand what you're saying correctly if we ever match the gc --auto requirements because we have *just* the big packs and then a bunch of loose objects (say we rebased a lot) then we'll try to create a giant 5GB pack (+ loose objects). >> Finally, I don't like the way the current implementation conflates a >> "size" variable with auto detecting the size from memory, leaving no way >> to fallback to the auto-detection if you set it manually. >> >> I think we should split out the auto-memory behavior into another >> variable, and also make the currently hardcoded 50% of memory >> configurable. >> >> That way you could e.g. say you'd always like to keep 2GB packs, but if >> you happen to have ended up with a 1GB pack and it's time to repack, and >> you only have 500MB free memory on that system, it would keep the 1GB >> one until such time as we have more memory. > > I don't calculate based on free memory (it's tricky to get that right > on linux) but physical memory. If you don't have enough memory > according to this formula, you won't until you add more memory sticks. Ah, thanks for the clarification. >> >> Actually maybe that should be a "if we're that low on memory, forget >> about GC for now" config, but urgh, there's a lot of potential >> complexity to be handled here... > > Yeah I think what you want is a hook. You can make custom rules then. > We already have pre-auto-gc hook and could pretty much do what you > want without pack-objects memory estimation. But if you want it, maybe > we can export the info to the hook somehow. I can do away with that particular thing, but I'd really like to do without the hook. I can automate it on some machines, but then we also have un-managed laptops run by users who clone big repos. It's much easier to tell them to set a few git config variables than have them install & keep some hook up-to-date.