Re: [PATCH v6 00/11] nd/pack-objects-pack-struct updates

Duy Nguyen <pclouds@xxxxxxxxx> · Wed, 21 Mar 2018 16:59:19 +0100

On Wed, Mar 21, 2018 at 9:24 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Sun, Mar 18, 2018 at 03:25:15PM +0100, Nguyễn Thái Ngọc Duy wrote:
>
>> v6 fixes the one optimization that I just couldn't get right, fixes
>> two off-by-one error messages and a couple commit message update
>> (biggest change is in 11/11 to record some numbers from AEvar)
>
> I was traveling during some of the earlier rounds, so I finally got a
> chance to take a look at this.
>
> I hate to be a wet blanket, but am I the only one who is wondering
> whether the tradeoffs is worth it? 8% memory reduction doesn't seem
> mind-bogglingly good,

AEvar measured RSS. If we count objects[] array alone, the saving is
40% (136 bytes per entry down to 80). Some is probably eaten up by
mmap in rss.

> and I'm concerned about two things:
>
>   1. The resulting code is harder to read and reason about (things like
>      the DELTA() macros), and seems a lot more brittle (things like the
>      new size_valid checks).
>
>   2. There are lots of new limits. Some of these are probably fine
>      (e.g., the cacheable delta size), but things like the
>      number-of-packs limit don't have very good user-facing behavior.
>      Yes, having that many packs is insane, but that's going to be small
>      consolation to somebody whose automated maintenance program now
>      craps out at 16k packs, when it previously would have just worked
>      to fix the situation.
>
> Saving 8% is nice, but the number of objects in linux.git grew over 12%
> in the last year. So you've bought yourself 8 months before the problem
> is back. Is it worth making these changes that we'll have to deal with
> for many years to buy 8 months of memory savings?

Well, with 40% it buys us a couple more months. The object growth
affects rev-list --all too so the actual "good months" is probably not
super far from 8 months.

Is it worth saving? I don't know. I raised the readability point from
the very first patch and if people believe it makes it much harder to
read, then no it's not worth it.

While pack-objects is simple from the functionality point of view, it
has received lots of optimizations and to me is quite fragile.
Readability does count in this code. Fortunately it still looks quite
ok to me with this series applied (but then it's subjective)

About the 16k limit (and some other limits as well), I'm making these
patches with the assumption that large scale deployment probably will
go with custom builds anyway. Adjusting the limits back should be
quite easy while we can still provide reasonable defaults for most
people.

> I think ultimately to work on low-memory machines we'll need a
> fundamentally different approach that scales with the objects since the
> last pack, and not with the complete history.

Absolutely. Which is covered in a separate "gc --auto" series. Some
memory reduction here may be still nice to have though. Even on beefy
machine, memory can still be reused somewhere other than wasted in
unused bits.
-- 
Duy