Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 15, 2018 at 8:21 PM, Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
>
> On Thu, Mar 15 2018, Duy Nguyen jotted:
>
>> On Mon, Mar 12, 2018 at 8:30 PM, Ævar Arnfjörð Bjarmason
>> <avarab@xxxxxxxxx> wrote:
>>> We already have pack.packSizeLimit, perhaps we could call this
>>> e.g. gc.keepPacksSize=2GB?
>>
>> I'm OK either way. The "base pack" concept comes from the
>> "--keep-base-pack" option where we do keep _one_ base pack. But gc
>> config var has a slightly different semantics when it can keep
>> multiple packs.
>
> I see, yeah it would be great to generalize it to N packs.
>
>>> Finally I wonder if there should be something equivalent to
>>> gc.autoPackLimit for this. I.e. with my proposed semantics above it's
>>> possible that we end up growing forever, i.e. I could have 1000 2GB
>>> packs and then 50 very small packs per gc.autoPackLimit.
>>>
>>> Maybe we need a gc.keepPackLimit=100 to deal with that, then e.g. if
>>> gc.keepPacksSize=2GB is set and we have 101 >= 2GB packs, we'd pick the
>>> two smallest one and not issue a --keep-pack for those, although then
>>> maybe our memory use would spike past the limit.
>>>
>>> I don't know, maybe we can leave that for later, but I'm quite keen to
>>> turn the top-level config variable into something that just considers
>>> size instead of "base" if possible, and it seems we're >95% of the way
>>> to that already with this patch.
>>
>> At least I will try to ignore gc.keepPacksSize if all packs are kept
>> because of it. That repack run will hurt. But then we're back to one
>> giant pack and plenty of small packs that will take some time to grow
>> up to 2GB again.
>
> I think that semantic really should have its own option. The usefulness
> of this is significantly diminished if it's not a guarantee on the
> resource use of git-gc.
>
> Consider a very large repo where we clone and get a 4GB pack. Then as
> time goes on we end up with lots of loose objects and small packs from
> pulling, and eventually end up with say 4GB + 2x 500MB packs (if our
> limit is 500MB).
>
> If I understand what you're saying correctly if we ever match the gc
> --auto requirements because we have *just* the big packs and then a
> bunch of loose objects (say we rebased a lot) then we'll try to create a
> giant 5GB pack (+ loose objects).

Yes. There isn't a simple and easy solution here and I consider
packing (even if it's expensive) to regain performance is better than
not packing at all. I could tweak that a bit by keeping the largest
pack out (so we have to packs in the end). After a long long long time
when your second pack gets to 5 GB, then we hit the most expensive
repack. But that should be ok for now, I guess.

I think this repack strategy was discussed here at some point in the
past by Gerrit guys. Their goal was to reduce I/O, I believe. A
perfect solution probably could be found, but I don't want to hold
this series back until it's found and I don't want to introduce a
zillion config knobs that become useless later on when the perfect
solution is found.

>>> Actually maybe that should be a "if we're that low on memory, forget
>>> about GC for now" config, but urgh, there's a lot of potential
>>> complexity to be handled here...
>>
>> Yeah I think what you want is a hook. You can make custom rules then.
>> We already have pre-auto-gc hook and could pretty much do what you
>> want without pack-objects memory estimation. But if you want it, maybe
>> we can export the info to the hook somehow.
>
> I can do away with that particular thing, but I'd really like to do
> without the hook. I can automate it on some machines, but then we also
> have un-managed laptops run by users who clone big repos. It's much
> easier to tell them to set a few git config variables than have them
> install & keep some hook up-to-date.

That sounds like we need a mechanism to push hooks (and config stuff)
automatically from clone source. I think this topic was touched in the
summit? I don't object adding new config but we need to figure out
what we need, and from this thread I think there are too many "I don't
know" to settle on a solution.
-- 
Duy




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux