Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates

Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> · Wed, 21 Mar 2018 17:17:04 +0100

On Wed, Mar 21 2018, Duy Nguyen wrote:

> On Wed, Mar 21, 2018 at 9:24 AM, Jeff King <peff@xxxxxxxx> wrote:
>> On Sun, Mar 18, 2018 at 03:25:15PM +0100, Nguyễn Thái Ngọc Duy wrote:
>>
>>> v6 fixes the one optimization that I just couldn't get right, fixes
>>> two off-by-one error messages and a couple commit message update
>>> (biggest change is in 11/11 to record some numbers from AEvar)
>>
>> I was traveling during some of the earlier rounds, so I finally got a
>> chance to take a look at this.
>>
>> I hate to be a wet blanket, but am I the only one who is wondering
>> whether the tradeoffs is worth it? 8% memory reduction doesn't seem
>> mind-bogglingly good,
>
> AEvar measured RSS. If we count objects[] array alone, the saving is
> 40% (136 bytes per entry down to 80). Some is probably eaten up by
> mmap in rss.

Yeah, sorry about spreading that confusion.

>> and I'm concerned about two things:
>>
>>   1. The resulting code is harder to read and reason about (things like
>>      the DELTA() macros), and seems a lot more brittle (things like the
>>      new size_valid checks).
>>
>>   2. There are lots of new limits. Some of these are probably fine
>>      (e.g., the cacheable delta size), but things like the
>>      number-of-packs limit don't have very good user-facing behavior.
>>      Yes, having that many packs is insane, but that's going to be small
>>      consolation to somebody whose automated maintenance program now
>>      craps out at 16k packs, when it previously would have just worked
>>      to fix the situation.
>>
>> Saving 8% is nice, but the number of objects in linux.git grew over 12%
>> in the last year. So you've bought yourself 8 months before the problem
>> is back. Is it worth making these changes that we'll have to deal with
>> for many years to buy 8 months of memory savings?
>
> Well, with 40% it buys us a couple more months. The object growth
> affects rev-list --all too so the actual "good months" is probably not
> super far from 8 months.
>
> Is it worth saving? I don't know. I raised the readability point from
> the very first patch and if people believe it makes it much harder to
> read, then no it's not worth it.
>
> While pack-objects is simple from the functionality point of view, it
> has received lots of optimizations and to me is quite fragile.
> Readability does count in this code. Fortunately it still looks quite
> ok to me with this series applied (but then it's subjective)
>
> About the 16k limit (and some other limits as well), I'm making these
> patches with the assumption that large scale deployment probably will
> go with custom builds anyway. Adjusting the limits back should be
> quite easy while we can still provide reasonable defaults for most
> people.
>
>> I think ultimately to work on low-memory machines we'll need a
>> fundamentally different approach that scales with the objects since the
>> last pack, and not with the complete history.
>
> Absolutely. Which is covered in a separate "gc --auto" series. Some
> memory reduction here may be still nice to have though. Even on beefy
> machine, memory can still be reused somewhere other than wasted in
> unused bits.

FWIW I've been running a combination of these two at work (also keeping
the big pack), and they've had a sizable impact on packing our monorepo,
on one of our dev boxes on a real-world checkout with a combo of the
"base" pack and other packs + loose objects, as measured by
/usr/bin/time

 * Reduction in user time by 37%
 * Reduction in system time by 84%
 * Reduction in RSS by 61%
 * Reduction in page faults by 58% & 94% (time(1) reports two different numbers)
 * Reduction in the I of I/O by 58%
 * Reduction in the O of I/O by 94%