On Thu, Jul 19, 2018 at 5:27 PM Elijah Newren <newren@xxxxxxxxx> wrote: > > On Wed, Jul 18, 2018 at 10:41 PM, Duy Nguyen <pclouds@xxxxxxxxx> wrote: > > On Thu, Jul 19, 2018 at 12:51 AM Elijah Newren <newren@xxxxxxxxx> wrote: > >> > >> I had a user report some poor behavior of 'git gc --aggressive' on a > >> certain repo (which I sadly cannot share). Turns out that on this > >> repo, this operation takes about 60% longer and produces a pack > >> roughly twice the expected size. > > > > The intention was to make life better for weaker machines but > > definitely should not slow down beefier ones, so yes this is > > definitely a regression. > > Not sure if it matters, but the original discovery was on a laptop. > Probably 4 cores and 16 GB RAM. I duplicated on my workstation (8 > cores, 24 GB RAM). However, since the original patch was memory > related and I noticed the repacking using all available memory, I > repeated the testcases while measuring memory but did it on a machine > that wouldn't be memory limited. That should be plentiful for a 5GB pack with default cache settings, I think. > > Is it possible to share "verify-pack -v <pack file>" output of the > > pack produced by 2.17.0 and 2.18.0? The only sensitive info there is > > sha-1, which you can replace with just "SHA-1" if you want. I'm more > > interested in delta sizes and distribution. > > For the deltas, assuming the size-in-pack field (4th column) is the relevant one > > Number of objects in repo (lines of verify-pack output): 4460755 > Number of deltas: 2831384 > Number of deltas greater than 1MB: 72 > Min: 14 > Median: 47 > Mean: 586 > 99.999 percentile: 11366280.6 (10.8 MB) > Max: 141664210 (135.1 MB) > > If the size field (3rd column) is the relevant one, then the numbers > change slightly to: > > Number of deltas greater than 1MB: 101 > Min: 4 > Median: 33 > Mean: 660 > 99.999 percentile: 12245551.7 (11.7 MB) I think size is the right one because even deltas are compressed before stored in the pack file (I probably should check the code...) but anyway this number is bigger and that sounds to me like 16MB as delta size limit should cover common case nicely. 32MB to be on the safe side, but given default cache size of 256MB, a single delta 1/8 the size of the cache sounds too much. > Max: 144658342 (138.0 MB) This one would trigger the patch I just sent, which should also handle it nicely (I hope). > I checked out the blob which had the biggest delta, as well as the > blob it was a delta against. One was a 280 MB file, the other a 278 > MB file. -- Duy